From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability

Alexandre Cristov\~ao Maiorano

arxiv: 2606.05252 · v1 · pith:75FGJQCTnew · submitted 2026-06-03 · 💻 cs.CR · cs.AI

From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability

Alexandre Cristov\~ao Maiorano This is my paper

Pith reviewed 2026-06-28 05:36 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords SIEMSigma rulesattack simulationdeterministic synthesisdetection-as-codetraceabilityOWASP

0 comments

The pith

When probes come from a locked corpus, each bypassed finding maps deterministically to a starter Sigma rule with full traceability back to the probe.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the manual step of turning breach-and-attack-simulation findings into SIEM detection rules can be replaced by a deterministic function. Because each probe has a stable identifier, the function looks up a template from a library of 23 entries and produces a Sigma rule that includes a back-reference to the original finding and its MITRE technique. This produces rules that all parse correctly and convert to Splunk and Elasticsearch, and when tested they detect portions of held-out attack benchmarks at low false positive rates on benign data. The result is a reproducible path from simulation to deployable rule that does not rely on generative AI.

Core claim

The paper presents a deterministic synthesis function that takes each bypassed-probe finding from a locked corpus and produces a starter Sigma rule via template lookup, preserving a typed traceback to the originating probe and MITRE ATT&CK technique; on the tested corpora this function succeeds for every finding and yields rules that are valid across multiple SIEM backends.

What carries the argument

The deterministic synthesis function, which indexes a template library by OWASP categories and attaches probe identifiers for traceability.

If this is right

Every bypassed-probe finding produces a starter rule.
All emitted rules parse and convert to Splunk and Elasticsearch backends.
The LLM rules detect 30% of a held-out AdvBench subset and 14% of HarmBench at 7.7% false positives on a benign baseline.
The path from finding to rule is byte-stable and re-derivable from the published corpus and templates alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar deterministic mappings could be developed for other detection formats beyond Sigma.
The approach trades generative flexibility for guaranteed reproducibility and auditability.
Extending the template library would allow coverage of additional attack categories without changing the core method.

Load-bearing premise

The probes must come from a locked corpus that provides stable identifiers for each finding.

What would settle it

Running the synthesis function on the published 17-probe LLM corpus and finding that it fails to emit exactly 17 valid, traceable rules or that the rules do not convert to Splunk and Elasticsearch backends.

Figures

Figures reproduced from arXiv: 2606.05252 by Alexandre Cristov\~ao Maiorano.

**Figure 2.** Figure 2: The LLM01 rule set’s AdvBench heldout fire rate across the four evaluation stages it appears in throughout Section 4: the v1 keywordonly rubric (0/50, a baseline failure), the v2 keyword+regex rubric as a Python prototype (31/50) and integrated in the live engine (30/50, synthetic-log harness), and the same v2 rules replayed through a real OpenSearch+Lucene SIEM (15/50). The four rates are one rule set … view at source ↗

**Figure 3.** Figure 3: Real-SIEM replay through OpenSearch + Lucene: union match rate per cohort. The two left bars (AdvBench, HarmBench) are operator-facing detection rates on held-out attack corpora; the two right bars (Benign-LLM, Benign-Web) are the false-positive surface on benign baselines. The per-cohort match records are included in the replication package. nign cohorts through a live SIEM. We stood up OpenSearch 2.13.0… view at source ↗

read the original abstract

Security teams routinely simulate attacks against their own systems to check whether their monitoring would catch a real intruder. These Breach-and-Attack-Simulation (BAS) tools surface findings, but the security information and event management (SIEM) systems that watch production need detection rules -- and today a human bridges that gap by hand, reading each finding and writing the corresponding Sigma rule (a vendor-neutral detection format). We show this translation can be partially automated when probes are drawn from a locked corpus, so each finding carries a stable identifier back to the originating probe. We describe a deterministic synthesis function that maps each finding to a starter Sigma rule through a small template library (N=23, indexed by categories from the OWASP LLM and Web Top 10), with a back-reference to the originating finding and its MITRE ATT&CK technique. On two locked corpora (17-probe LLM, 23-probe Web), every bypassed-probe finding yields a starter rule, and all 17/17 emitted rules parse and convert to Splunk and Elasticsearch backends. Replayed through a live OpenSearch SIEM, the LLM rules fire on 30% of a held-out AdvBench subset and 14% of HarmBench at 7.7% false positives on a benign baseline; the Web side is validated structurally, not against a held-out attack set. The contribution is a verifiable, byte-stable path from BAS finding to operator-deployable starter rule, re-derivable from the published corpus and template library alone -- trading the breadth of LLM-generative methods for exact reproducibility and a typed traceback from any fired alert to the originating probe.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a deterministic template method to turn locked-corpus BAS findings into traceable starter Sigma rules, with solid reproducibility under that precondition.

read the letter

The main takeaway is that this work shows how to automate the BAS-to-Sigma-rule step deterministically when probes come from a fixed corpus. Each finding keeps a stable ID back to its probe, which lets the synthesis pull in the right template and add a MITRE ATT&CK back-reference.

What the paper actually delivers is a small fixed template library (23 entries, keyed to OWASP categories) plus a synthesis function that produces rules for every bypassed finding in the two test corpora. All 17 LLM rules parse and convert to Splunk and Elasticsearch. The OpenSearch replay gives concrete numbers: 30% detection on AdvBench held-out, 14% on HarmBench, at 7.7% false positives on benign traffic. The Web side is checked only for structure. The byte-stable re-derivability from the published corpus and templates is a genuine practical advantage over generative approaches.

The soft spots are straightforward. Everything rests on the locked-corpus precondition; drop that and the deterministic mapping no longer holds. The template set is small and category-specific, so new probe types will need new templates. The Web validation lacks the empirical firing test given to the LLM rules. No details on edge-case handling appear in the abstract.

This is for security-ops readers who already run BAS tools against known probe sets and want reproducible starter rules they can audit and deploy. It will not interest people looking for general-purpose detection generation. The work deserves a serious referee because the claims are scoped clearly and the reported results are verifiable under the stated conditions.

Referee Report

0 major / 2 minor

Summary. The paper claims that, for probes drawn from a locked corpus providing stable identifiers, a deterministic synthesis function using a fixed library of 23 templates (indexed by OWASP LLM and Web Top 10 categories) can map each bypassed-probe BAS finding to a starter Sigma rule that includes a back-reference to the probe and its MITRE ATT&CK technique. On two such corpora (17 LLM probes, 23 Web probes) every finding produces a rule, all 17/17 emitted rules parse and convert to Splunk and Elasticsearch, and replay on a live OpenSearch SIEM yields 30% detection on a held-out AdvBench subset and 14% on HarmBench at 7.7% false positives on benign traffic (Web side validated structurally only). The contribution is positioned as a verifiable, byte-stable, re-derivable path trading generative breadth for exact reproducibility and probe-level traceability.

Significance. If the central claim holds under the stated precondition of a locked corpus, the work supplies a reproducible, auditable alternative to manual or fully generative rule authoring that directly supports security operations. The explicit use of a published corpus and template library to guarantee byte-stable re-derivability is a concrete strength that enables independent verification and traceability from any fired alert back to the originating probe.

minor comments (2)

[Abstract] Abstract: the description of the synthesis function itself is high-level; adding a concise pseudocode or explicit mapping table (even if the full implementation is in supplementary material) would make the deterministic claim easier to assess without requiring the reader to reconstruct the function from the template count alone.
The Web corpus results are described as 'validated structurally' rather than against a held-out attack set; a brief clarification of what structural validation entails would avoid any ambiguity about the strength of that side of the evaluation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of our contribution and for the positive assessment of the deterministic synthesis approach under the locked-corpus precondition. The recommendation for minor revision is noted. No major comments were enumerated in the report, so we have no specific points requiring point-by-point rebuttal at this time.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper scopes its contribution explicitly to a deterministic synthesis function operating on a locked probe corpus that supplies stable identifiers, mapping each bypassed finding to a starter Sigma rule via a fixed external template library (N=23, indexed by OWASP categories) with back-references to MITRE ATT&CK. This mapping is presented as a direct, byte-stable function of the input corpus and templates rather than a fitted model or self-referential derivation; the 17/17 parse rate, backend conversions, and held-out replay results are reported as empirical measurements under that precondition, not as evidence that the precondition can be relaxed. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the described chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the pre-existence of locked probe corpora and a fixed 23-template library as inputs; no free parameters are fitted within the synthesis itself.

free parameters (1)

Template library size = 23
N=23 templates chosen to index OWASP LLM and Web Top 10 categories.

axioms (1)

domain assumption Locked corpora ensure stable probe identifiers for traceability.
The deterministic mapping and back-reference mechanism requires fixed, non-changing probe sets.

pith-pipeline@v0.9.1-grok · 5832 in / 1315 out tokens · 33569 ms · 2026-06-28T05:36:16.479077+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 2 canonical work pages · 1 internal anchor

[1]

garak: A framework for security probing large language models, 2024

Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie. garak: A framework for security probing large language models, 2024

2024
[2]

Which Defense Closes Which Threat? Attributing OWASP-LLM-Top-10 Coverage and Its Brittleness Under Paraphrasing

Alexandre Cristov˜ ao Maiorano. Which defense closes which threat? attributing OWASP-LLM- top-10 coverage and its brittleness under para- phrasing, 2026. URL https://arxiv.org/abs/ 2606.02822

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal. InInterna- tional Conference on Machine Learning (ICML), 2024

2024
[4]

OWASP top 10:2021

OWASP Foundation. OWASP top 10:2021. ht tps://owasp.org/Top10/, 2021

2021
[5]

OWASP top 10 for LLM applications

OWASP GenAI Security Project. OWASP top 10 for LLM applications. https://genai.owas p.org/llm-top-10/, 2025

2025
[6]

RuleGenie: SIEM detection rule set optimization, 2025

Akansha Shukla, Parth Atulbhai Gandhi, Yuval Elovici, and Asaf Shabtai. RuleGenie: SIEM detection rule set optimization, 2025

2025
[7]

Sigma: Generic signature format for SIEM systems

SigmaHQ. Sigma: Generic signature format for SIEM systems. https://github.com/SigmaHQ /sigma, 2024

2024
[8]

Sigmahq community rule library

SigmaHQ Contributors. Sigmahq community rule library. https://github.com/SigmaHQ/sigma , 2024

2024
[9]

MITRE CALDERA: Adversary emulation platform

The MITRE Corporation. MITRE CALDERA: Adversary emulation platform. https://calder a.mitre.org/, 2024

2024
[10]

MITRE ATT&CK

The MITRE Corporation. MITRE ATT&CK. https://attack.mitre.org/, 2024

2024
[11]

OpenSearch: An open- source distributed search and analytics suite

The OpenSearch Project. OpenSearch: An open- source distributed search and analytics suite. ht tps://opensearch.org/, 2024. 18

2024
[12]

Wudali, Moshe Kravchik, Ehud Malul, Parth A

Prasanna N. Wudali, Moshe Kravchik, Ehud Malul, Parth A. Gandhi, Yuval Elovici, and Asaf Shabtai. Rule-ATT&CK mapper (RAM): Map- ping SIEM rules to TTPs using LLMs, 2025. URL https://arxiv.org/abs/2502.02337

work page arXiv 2025
[13]

’ OR 1=1--

Andy Zou, Zifan Wang, Nicholas Carlini, Mi- lad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. A Worked Examples: BAS Finding→Sigma Rule This appendix walks through three end-to-end exam- ples covering the three template-source classes (legacy MITRE T-code, OWASP LLM, OWASP Web)....

2023

[1] [1]

garak: A framework for security probing large language models, 2024

Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie. garak: A framework for security probing large language models, 2024

2024

[2] [2]

Which Defense Closes Which Threat? Attributing OWASP-LLM-Top-10 Coverage and Its Brittleness Under Paraphrasing

Alexandre Cristov˜ ao Maiorano. Which defense closes which threat? attributing OWASP-LLM- top-10 coverage and its brittleness under para- phrasing, 2026. URL https://arxiv.org/abs/ 2606.02822

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal. InInterna- tional Conference on Machine Learning (ICML), 2024

2024

[4] [4]

OWASP top 10:2021

OWASP Foundation. OWASP top 10:2021. ht tps://owasp.org/Top10/, 2021

2021

[5] [5]

OWASP top 10 for LLM applications

OWASP GenAI Security Project. OWASP top 10 for LLM applications. https://genai.owas p.org/llm-top-10/, 2025

2025

[6] [6]

RuleGenie: SIEM detection rule set optimization, 2025

Akansha Shukla, Parth Atulbhai Gandhi, Yuval Elovici, and Asaf Shabtai. RuleGenie: SIEM detection rule set optimization, 2025

2025

[7] [7]

Sigma: Generic signature format for SIEM systems

SigmaHQ. Sigma: Generic signature format for SIEM systems. https://github.com/SigmaHQ /sigma, 2024

2024

[8] [8]

Sigmahq community rule library

SigmaHQ Contributors. Sigmahq community rule library. https://github.com/SigmaHQ/sigma , 2024

2024

[9] [9]

MITRE CALDERA: Adversary emulation platform

The MITRE Corporation. MITRE CALDERA: Adversary emulation platform. https://calder a.mitre.org/, 2024

2024

[10] [10]

MITRE ATT&CK

The MITRE Corporation. MITRE ATT&CK. https://attack.mitre.org/, 2024

2024

[11] [11]

OpenSearch: An open- source distributed search and analytics suite

The OpenSearch Project. OpenSearch: An open- source distributed search and analytics suite. ht tps://opensearch.org/, 2024. 18

2024

[12] [12]

Wudali, Moshe Kravchik, Ehud Malul, Parth A

Prasanna N. Wudali, Moshe Kravchik, Ehud Malul, Parth A. Gandhi, Yuval Elovici, and Asaf Shabtai. Rule-ATT&CK mapper (RAM): Map- ping SIEM rules to TTPs using LLMs, 2025. URL https://arxiv.org/abs/2502.02337

work page arXiv 2025

[13] [13]

’ OR 1=1--

Andy Zou, Zifan Wang, Nicholas Carlini, Mi- lad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. A Worked Examples: BAS Finding→Sigma Rule This appendix walks through three end-to-end exam- ples covering the three template-source classes (legacy MITRE T-code, OWASP LLM, OWASP Web)....

2023