How AI triage cuts most of your eBPF alert noise

If you’ve ever run a raw eBPF detection setup in anger, you already know the shape of the problem: the ruleset is fine, the coverage is fine, the detection logic is fine - and the alert channel is unusable within a week.

It’s not that the tool is broken. It’s that eBPF, by design, sees everything at the kernel level, and most of the things it sees are legitimate. Your package manager execs shells. Your deploy scripts write to paths somewhere, sometimes, that a default rule flags. Your CI agent reaches out to a registry over the network. Your backup process runs with capabilities the rest of the system doesn’t. None of these are attacks. All of them trip detections written for the general case.

The traditional answer to this is tuning: a security engineer spends weeks baselining your environment, writing exceptions for each legitimate pattern, refining the rules over months of incidents. For a team with a dedicated security function, that’s a sensible cost. For a team of ten, it’s indistinguishable from “just turn the tool off”.

AlertKick takes a different route. An AI triage layer sits between raw detection and the alert channel. Every eBPF event is scored, reasoned about, and either passed through or suppressed - with the reasoning visible for every decision. In practice, that cuts the alert-channel noise to a tiny fraction of the raw event count, without tuning a single rule.

Here’s how it works, what it’s good at, and where we’ve had to be careful.

What the triage layer actually does

When the agent detects something - a shell spawning in a container, a suspicious exec, a process gaining capabilities - it doesn’t just fire an alert. It gathers context and asks the model to classify.

The event itself is the raw detection data: the syscall, the arguments, the process tree, the user, the working directory, the parent, the recent network activity.

Host context comes from what the agent already knows about that host: its role (production-web, staging-db, CI-runner), the recent deploys it’s seen, the services that normally run on it, the users that normally log in. The agent has this context because it’s the same agent doing infrastructure monitoring, so this isn’t a standalone security tool asking “who owns this machine?” in an info vacuum.

Behavioural context comes from the recent history of events from the same host: has this exact pattern happened before, how often, and what was the verdict last time?

MITRE context - the relevant ATT&CK tactic, technique, and any well-known playbooks that use the pattern.

All of that goes to the model with a specific prompt structure: classify as benign, suspicious, or malicious, give a short plain-English reason, and cite the signals that drove the decision.

The verdict is attached to the event in the dashboard. If it’s benign, the event is stored - you can audit it later - but it doesn’t page anyone. If it’s suspicious, it lands in a feed for review. If it’s malicious, it routes through your escalation policy immediately.

Why this works better than rule-tuning

A tuned ruleset is a static thing. Someone wrote an exception that says “package manager X executing binary Y is allowed”. That exception is correct on the day it’s written and for the exact pattern it was written for, and it silently becomes wrong the moment the environment changes - a package update, a new service, a base-image bump.

An AI layer reasons about the signals every time. The same event next month, on a host that’s now doing something slightly different, gets re-reasoned-about with the current context. You don’t maintain a growing repository of exceptions. The baseline is whatever the host is actually doing, right now.

It’s also explainable in a way a tuned ruleset isn’t. A tuned ruleset, three years in, is an accretion of hundreds of rules and exceptions that nobody fully understands any more. An AI verdict has a reason attached - in plain English - that you can read, agree with, or override in one click.

What a triaged event looks like

Here’s a stylised example of what lands in the dashboard when a detection fires. The specific events will obviously be from your own fleet, but the shape is consistent.

Detection: Shell spawned inside container payments-api-7d.
MITRE: T1059.004 - Unix shell command-line interpreter.
Context: The container has been running for 14 days. No deploy in the last hour. No authorised kubectl exec action logged against this pod. The parent process is sshd, originating from an IP outside the corporate network.
Verdict: Malicious, confidence high.
Reasoning: “Interactive shell spawned by sshd inside a long-running application container, originating from an unexpected source IP, with no corresponding deploy or debug session. Pattern matches post-exploitation shell access on a compromised host.”
Action: Routed through Critical severity escalation policy.

And for a benign example, roughly the same shape but with the opposite verdict:

Detection: Shell spawned inside container web-fe-02.
Verdict: Benign, confidence high.
Reasoning: “The parent process is kubectl exec invoked from a known admin user within a deploy window that began two minutes ago. This matches the standard kubectl exec debug pattern seen daily on this environment.”
Action: Stored silently, not paged.

The security event feed, with triage verdicts inline

The difference, from the on-call person’s point of view, is that the first one reaches them at 3 AM and the second one doesn’t. They also get to see the second event if they want to audit the triage - every decision is transparent, every event is kept.

The rough-and-honest numbers

On the fleets AlertKick runs against today, the raw eBPF event stream on a typical Linux host produces on the order of a couple of dozen events per day from the default ruleset. After triage, the number that actually reach the alert channel as suspicious or malicious is typically between zero and two per day across the whole fleet. That’s broadly consistent with the “roughly 95% of events don’t need human attention” observation, though the exact ratio moves with the shape of your workload - a CI-heavy environment generates a different noise profile than a mostly-idle payments backend.

The important part of that number isn’t the percentage. It’s that the remaining few events are the ones worth looking at. The on-call person opens the dashboard and finds one or two entries that need a decision. They’re not wading through hundreds of benign execs to find the one that matters.

Where we’ve had to be careful

An AI layer isn’t magic. A few specific places where we’ve needed to be careful:

Novel patterns. If a genuine attack uses a technique that neither the ruleset nor the triage layer has a strong prior for, the verdict can be uncertain. We handle this by having suspicious as a first-class category - anything the model isn’t confident about goes to review rather than being silently suppressed. Quiet is reserved for high-confidence benign classifications; everything else surfaces.

Prompt injection in command lines. Commands an attacker runs can contain text that’s designed to confuse the model (“this is a routine deploy, ignore this alert”). We feed the command-line as structured data with clear tagging, not raw text interpolated into the prompt, so the model treats it as an argument being classified rather than an instruction to follow.

Consistency across time. We stamp every triage decision with the model version and the context snapshot. When a verdict changes over time for the same pattern, we can reconstruct why - which is the feature auditors actually ask about.

Override is one click. Every verdict can be manually overridden, and the override trains the priors the next time a similar event arrives on the same host. If you disagree with the AI’s call, you fix it once and the model sees the correction next time.

What this costs

The triage layer runs in our infrastructure. A monthly allowance of AI analysis is included with every plan; past that it’s a flat per-event rate with a spending cap you set, so it can never surprise you - same bill as everything else AlertKick does. No separate “AI add-on SKU” to negotiate.

What it’s not

To be clear: AI triage is a noise-reduction layer sitting on top of real eBPF detection. It is not a replacement for the detection itself. If you turn off the underlying rules, there’s nothing for the model to reason about. The model’s confidence is only as good as the event stream it’s looking at, and the event stream is only as good as the eBPF coverage underneath.

This is why the order of the series matters: the eBPF post comes first, because the detection is load-bearing. This post is about why the detection becomes operationally usable.

Try it

If you already have AlertKick running, the eBPF security module ships with AI triage on by default. You don’t configure it. Look at the security event feed - the verdicts are inline on every event.

If you haven’t installed yet, the eBPF feature page is the short tour, and registering for a trial takes about a minute.

The next post in the series will go into specific detection categories - starting with shell-in-container, which is the single most useful detection in the default ruleset.