Blog Your detection pipeline is green. That doesn’t mean your detections work.

Your detection pipeline is green. That doesn’t mean your detections work.

Detection-as-Code

Ethan Smart

June 9, 2026

5 min

The thing about detection-as-code is that it gives you a really satisfying green checkmark.

The rule compiles. The unit test passes. The PR gets a thumbs-up from a colleague who actually read it. GitHub Actions ships the change to Splunk or Sentinel or Chronicle, the deploy notification lands in the channel, and you move on. Pipeline: green. Reviewer: happy. Detection: in production.

Two months later something happens that the rule was supposed to catch. The rule doesn’t fire. You find out because someone else’s tool, or someone else’s incident, surfaces what yours missed.

I’ve watched this happen enough times that it’s worth naming. Detection-as-code, as most teams practice it, answers the question did the rule deploy? That’s a great question. It just isn’t the same question as does the rule work?

The gap between those two questions is where almost every silent detection failure I’ve seen lives.

‍

The three failure modes

There are roughly three ways a deployed detection silently stops working. They’re not exotic and they’re not edge cases. They’re the load-bearing failure modes of the discipline, and most pipelines don’t notice any of them.

1. The log source quietly died.

The rule depends on a feed: Windows event logs, EDR telemetry, some SaaS audit log, whatever. The feed stopped flowing two weeks ago. The connector is degraded, someone rotated an API key, or the agent stopped reporting after a kernel update. No alert, because there’s nothing to alert on. The SOC queue is quiet. Quiet reads as good.

Most SIEMs ship a log-source-health dashboard. Most SIEMs don’t ship anyone who watches it. Vendors generally recommend alerting on single-digit-hour silence windows; in practice, the dashboard sits next to the disk-space dashboard nobody opens.

Graylog put this well in a piece they ran in May: “The most dangerous SIEM failure mode isn’t a visible error, it’s a silent one… organizations operate with false confidence.” That’s the right frame. False confidence is the cost.

2. The schema drifted underneath the rule.

The feed is still flowing. The fields the rule depends on are no longer named what the rule thinks they’re named.

Vendors do this constantly. Microsoft renames an Entra ID field. An EDR vendor splits one field into two during a major version bump. A SaaS audit log adds a parent object and the field you queried as actor.email is now actor.user.email. The rule’s syntax is still valid. The compile still succeeds. The unit test still passes because the test fixture is frozen at whatever schema the rule was written against, and nobody updated it.

Production sees the new schema. The rule sees nothing.

The SigmaHQ ecosystem makes this worse in a specific way: when a Sigma rule compiles down to your SIEM’s query language, the field-name mapping is doing more work than people realize. Endpoint logs might call a field process.command_line while your SIEM stores it as CommandLine. The Sigma rule abstracts over this. When the abstraction is right, you don’t think about it. When it’s wrong, you don’t think about it either because nothing is loud enough to make you.

3. The test fixture and the live query are not the same thing.

This one is subtle, and it’s the one DaC enthusiasts (me included) undersell.

You write a Sigma rule. You author a fixture event in JSON. You write a unit test that asserts the rule matches the fixture. CI runs. Green.

What the test verified: the rule’s logic, expressed in Sigma, matches the fixture, expressed in JSON.

What the test did not verify: the rule after compiling down to SPL or KQL or YARA-L, running in the SIEM, against the field normalization the SIEM applies, with the timestamp parsing the SIEM applies, against an event sourced through your live pipeline will match the equivalent live event.

A lot of joints in that sentence. Every joint is a place where things can go wrong. Timezone handling. Case sensitivity that’s different between the abstract rule and the compiled query. A field your normalizer trims that the rule expected to be untrimmed. The SIEM rewrites * semantics in a way the Sigma backend doesn’t quite mirror.

The unit test is a real test. It just isn’t a test of the question you actually have, which is will this rule fire on the real event when it shows up.

‍

A composite war story

A team I worked with had a high-confidence detection for a specific lateral-movement TTP. Sigma rule, peer-reviewed, deployed through a clean DaC pipeline. The test suite was good. The fixture was a real event captured during a previous red-team exercise.

The detection fired during the red team. It fired during purple-team exercises after that. Everyone was satisfied.

Eight months later, the EDR vendor pushed a minor version. One of the fields the rule depended on changed from a single string into a nested object. The Sigma backend the team used didn’t warn, because the field still technically existed; what changed was its shape. The compiled SPL still ran. It just returned zero rows.

The team found out four months later, during an unrelated exercise, that the rule had been dark for almost a third of the year. The pipeline had been green the whole time. The unit test had been green the whole time. The dashboard showed the rule as healthy.

The team was good. The pipeline was good. The failure wasn’t laziness. The failure was that nothing in the system was answering the right question.

‍

What would actually help

The answer isn’t “more unit tests,” and it isn’t “buy a BAS platform.” BAS validates that an attack runs and that some control catches it. Useful. Also not the same as validating that this rule, the one I just changed, fires on the events that should trigger it, in the SIEM the team actually uses.

What would help, concretely:

Assert that the log source the rule depends on has produced events in the last N hours, scoped per-rule rather than per-source.
Compare the schema the rule was written against to the schema currently flowing, and flag drift the moment it happens but not the moment someone notices the dashboard.
Confirm that the rule, as deployed, matches a known positive event when that event shows up live. Not against a fixture. Against the SIEM.

None of those are easy. The first is the closest to a solved problem and most teams still don’t do it well. The second and third are where almost no team I’ve talked to has a confident answer.

‍

(One paragraph about us, then back to the post.)

This is the problem Rilevera works on: end-to-end validation that deployed detections fire on the events they should, continuous log-source health tied to the rules that depend on each source, and schema-drift recognition when an upstream feed changes shape underneath a rule. We’re not the only people thinking about this, see the Graylog and Sigma-to-SIEM pieces above, but it’s the area we live in. Take that for whatever it’s worth.

‍

The pipeline isn’t the system

Detection-as-code is good. I’m not arguing against it. I’m arguing that the pipeline is one part of a larger system, and the part most teams have skipped is the part that answers the question they actually care about.

A green checkmark feels like an answer. Sometimes it’s just a beautifully formatted way of changing the subject.

‍