From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability
Pith reviewed 2026-06-28 05:36 UTC · model grok-4.3
The pith
When probes come from a locked corpus, each bypassed finding maps deterministically to a starter Sigma rule with full traceability back to the probe.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents a deterministic synthesis function that takes each bypassed-probe finding from a locked corpus and produces a starter Sigma rule via template lookup, preserving a typed traceback to the originating probe and MITRE ATT&CK technique; on the tested corpora this function succeeds for every finding and yields rules that are valid across multiple SIEM backends.
What carries the argument
The deterministic synthesis function, which indexes a template library by OWASP categories and attaches probe identifiers for traceability.
If this is right
- Every bypassed-probe finding produces a starter rule.
- All emitted rules parse and convert to Splunk and Elasticsearch backends.
- The LLM rules detect 30% of a held-out AdvBench subset and 14% of HarmBench at 7.7% false positives on a benign baseline.
- The path from finding to rule is byte-stable and re-derivable from the published corpus and templates alone.
Where Pith is reading between the lines
- Similar deterministic mappings could be developed for other detection formats beyond Sigma.
- The approach trades generative flexibility for guaranteed reproducibility and auditability.
- Extending the template library would allow coverage of additional attack categories without changing the core method.
Load-bearing premise
The probes must come from a locked corpus that provides stable identifiers for each finding.
What would settle it
Running the synthesis function on the published 17-probe LLM corpus and finding that it fails to emit exactly 17 valid, traceable rules or that the rules do not convert to Splunk and Elasticsearch backends.
Figures
read the original abstract
Security teams routinely simulate attacks against their own systems to check whether their monitoring would catch a real intruder. These Breach-and-Attack-Simulation (BAS) tools surface findings, but the security information and event management (SIEM) systems that watch production need detection rules -- and today a human bridges that gap by hand, reading each finding and writing the corresponding Sigma rule (a vendor-neutral detection format). We show this translation can be partially automated when probes are drawn from a locked corpus, so each finding carries a stable identifier back to the originating probe. We describe a deterministic synthesis function that maps each finding to a starter Sigma rule through a small template library (N=23, indexed by categories from the OWASP LLM and Web Top 10), with a back-reference to the originating finding and its MITRE ATT&CK technique. On two locked corpora (17-probe LLM, 23-probe Web), every bypassed-probe finding yields a starter rule, and all 17/17 emitted rules parse and convert to Splunk and Elasticsearch backends. Replayed through a live OpenSearch SIEM, the LLM rules fire on 30% of a held-out AdvBench subset and 14% of HarmBench at 7.7% false positives on a benign baseline; the Web side is validated structurally, not against a held-out attack set. The contribution is a verifiable, byte-stable path from BAS finding to operator-deployable starter rule, re-derivable from the published corpus and template library alone -- trading the breadth of LLM-generative methods for exact reproducibility and a typed traceback from any fired alert to the originating probe.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that, for probes drawn from a locked corpus providing stable identifiers, a deterministic synthesis function using a fixed library of 23 templates (indexed by OWASP LLM and Web Top 10 categories) can map each bypassed-probe BAS finding to a starter Sigma rule that includes a back-reference to the probe and its MITRE ATT&CK technique. On two such corpora (17 LLM probes, 23 Web probes) every finding produces a rule, all 17/17 emitted rules parse and convert to Splunk and Elasticsearch, and replay on a live OpenSearch SIEM yields 30% detection on a held-out AdvBench subset and 14% on HarmBench at 7.7% false positives on benign traffic (Web side validated structurally only). The contribution is positioned as a verifiable, byte-stable, re-derivable path trading generative breadth for exact reproducibility and probe-level traceability.
Significance. If the central claim holds under the stated precondition of a locked corpus, the work supplies a reproducible, auditable alternative to manual or fully generative rule authoring that directly supports security operations. The explicit use of a published corpus and template library to guarantee byte-stable re-derivability is a concrete strength that enables independent verification and traceability from any fired alert back to the originating probe.
minor comments (2)
- [Abstract] Abstract: the description of the synthesis function itself is high-level; adding a concise pseudocode or explicit mapping table (even if the full implementation is in supplementary material) would make the deterministic claim easier to assess without requiring the reader to reconstruct the function from the template count alone.
- The Web corpus results are described as 'validated structurally' rather than against a held-out attack set; a brief clarification of what structural validation entails would avoid any ambiguity about the strength of that side of the evaluation.
Simulated Author's Rebuttal
We thank the referee for the detailed summary of our contribution and for the positive assessment of the deterministic synthesis approach under the locked-corpus precondition. The recommendation for minor revision is noted. No major comments were enumerated in the report, so we have no specific points requiring point-by-point rebuttal at this time.
Circularity Check
No significant circularity identified
full rationale
The paper scopes its contribution explicitly to a deterministic synthesis function operating on a locked probe corpus that supplies stable identifiers, mapping each bypassed finding to a starter Sigma rule via a fixed external template library (N=23, indexed by OWASP categories) with back-references to MITRE ATT&CK. This mapping is presented as a direct, byte-stable function of the input corpus and templates rather than a fitted model or self-referential derivation; the 17/17 parse rate, backend conversions, and held-out replay results are reported as empirical measurements under that precondition, not as evidence that the precondition can be relaxed. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the described chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- Template library size =
23
axioms (1)
- domain assumption Locked corpora ensure stable probe identifiers for traceability.
Reference graph
Works this paper leans on
-
[1]
garak: A framework for security probing large language models, 2024
Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie. garak: A framework for security probing large language models, 2024
2024
-
[2]
Alexandre Cristov˜ ao Maiorano. Which defense closes which threat? attributing OWASP-LLM- top-10 coverage and its brittleness under para- phrasing, 2026. URL https://arxiv.org/abs/ 2606.02822
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. HarmBench: A standardized evaluation framework for auto- mated red teaming and robust refusal. InInterna- tional Conference on Machine Learning (ICML), 2024
2024
-
[4]
OWASP top 10:2021
OWASP Foundation. OWASP top 10:2021. ht tps://owasp.org/Top10/, 2021
2021
-
[5]
OWASP top 10 for LLM applications
OWASP GenAI Security Project. OWASP top 10 for LLM applications. https://genai.owas p.org/llm-top-10/, 2025
2025
-
[6]
RuleGenie: SIEM detection rule set optimization, 2025
Akansha Shukla, Parth Atulbhai Gandhi, Yuval Elovici, and Asaf Shabtai. RuleGenie: SIEM detection rule set optimization, 2025
2025
-
[7]
Sigma: Generic signature format for SIEM systems
SigmaHQ. Sigma: Generic signature format for SIEM systems. https://github.com/SigmaHQ /sigma, 2024
2024
-
[8]
Sigmahq community rule library
SigmaHQ Contributors. Sigmahq community rule library. https://github.com/SigmaHQ/sigma , 2024
2024
-
[9]
MITRE CALDERA: Adversary emulation platform
The MITRE Corporation. MITRE CALDERA: Adversary emulation platform. https://calder a.mitre.org/, 2024
2024
-
[10]
MITRE ATT&CK
The MITRE Corporation. MITRE ATT&CK. https://attack.mitre.org/, 2024
2024
-
[11]
OpenSearch: An open- source distributed search and analytics suite
The OpenSearch Project. OpenSearch: An open- source distributed search and analytics suite. ht tps://opensearch.org/, 2024. 18
2024
-
[12]
Wudali, Moshe Kravchik, Ehud Malul, Parth A
Prasanna N. Wudali, Moshe Kravchik, Ehud Malul, Parth A. Gandhi, Yuval Elovici, and Asaf Shabtai. Rule-ATT&CK mapper (RAM): Map- ping SIEM rules to TTPs using LLMs, 2025. URL https://arxiv.org/abs/2502.02337
-
[13]
’ OR 1=1--
Andy Zou, Zifan Wang, Nicholas Carlini, Mi- lad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. A Worked Examples: BAS Finding→Sigma Rule This appendix walks through three end-to-end exam- ples covering the three template-source classes (legacy MITRE T-code, OWASP LLM, OWASP Web)....
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.