arxiv: 2604.11078 · v1 · submitted 2026-04-13 · 💻 cs.CR

Recognition: unknown

From Context to Rules: Toward Unified Detection Rule Generation

Baoxu Liu, Cheng Meng, Fangli Ren, Qiuyun Wang, Wenxin Le, Xinyi Li, Zhengwei Jiang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:22 UTC · model grok-4.3

classification 💻 cs.CR

keywords detection rule generationsemantic projectionRAG frameworkunified mappingcybersecurity rulesLLM generationagentic systemrule optimization

0 comments

The pith

Dual semantic projections let one framework generate detection rules from any context in any language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to replace separate pipelines for each context-language pair with a single mapping that turns contexts and target languages into detection rules. It does this by projecting inputs into two separate semantic spaces—one for detection intent and one for detection logic—then using an agentic RAG system to retrieve and generate rules. A sympathetic reader would care because current methods require custom engineering for every new input type or output language, while this abstraction promises one system that works across all of them. The experiments measure the gain through thousands of pairwise human comparisons.

Core claim

Detection rule generation can be formalized as the unified function f mapping from context C and language L to rules R, with optimal rules defined by minimal semantic distance; UniRule realizes this by retrieving from dual projection spaces of intent and logic, and the resulting rules are preferred over pure LLM outputs in 12,000 comparisons across three languages and four context types.

What carries the argument

Dual semantic projection spaces that separately encode detection intent and detection logic, allowing retrieval-augmented generation to produce rules for arbitrary inputs inside a single agentic framework.

If this is right

Rule generation no longer requires a dedicated pipeline for each input-output combination.
Semantic distance in the projected spaces can serve as a general criterion for selecting or ranking rules.
The same retrieval and generation steps apply without modification when the input context or output language changes.
Performance can be quantified uniformly across scenarios using pairwise preference data and Bradley-Terry modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security teams could maintain fewer custom scripts when writing rules for multiple languages or log formats.
The same dual-space design might be tested on other structured generation tasks such as policy or compliance rule creation.
If the projections remain lossless, the framework could be extended to generate rules directly from raw data streams rather than pre-processed contexts.

Load-bearing premise

The two semantic spaces can capture every relevant detail of any context and any target language without information loss or the need for extra adjustments.

What would settle it

A new context-language combination where expert raters consistently prefer rules produced by direct LLM prompting over those produced by the dual-projection system in head-to-head comparisons.

Figures

Figures reproduced from arXiv: 2604.11078 by Baoxu Liu, Cheng Meng, Fangli Ren, Qiuyun Wang, Wenxin Le, Xinyi Li, Zhengwei Jiang.

**Figure 1.** Figure 1: Overview of UniRule. At runtime (left), given a detection context and target language, an LLM agent autonomously retrieves relevant rules as needed and generates the output. Offline (right), heterogeneous source rules are translated into detection intent and detection logic descriptions, then embedded and indexed. The functions I and Cov defined in §3 are central to rule quality but cannot be computed from… view at source ↗

**Figure 2.** Figure 2: presents the per-scenario breakdown. Of the 12 scenarios, UniRule is significantly positive in 9 and significantly negative in 3, with no non-significant results. All 8 Splunk and Elastic scenarios show significant improvement, with coefficients ranging from 0.28 to 1.51. These languages capture behaviors (e.g., event counts, field patterns) where semantic retrieval can fill information gaps with transfer… view at source ↗

**Figure 3.** Figure 3: Semantic decomposition of a Splunk rule detecting double-extension files (ID: b06a555e-dce0-417d-a2eb-28a5d8d66ef7). The rule is translated into detection intent (threat semantics) and detection logic (technical patterns). Bold text shows summaries; gray text shows full descriptions [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: UniRule reasoning trace and output comparison. Left: the agent retrieves reference rules from both intent and logic spaces. Right: UniRule generates a more comprehensive rule than the Human-Authored alternative. Generation Process [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Existing methods for detection rule generation are tightly coupled to specific input-output combinations, requiring dedicated pipelines for each. We formalize this problem as a unified mapping f:C*L->R and characterize optimal rules through semantic distance. We propose UniRule, an agentic RAG framework built on dual semantic projection spaces: detection intent and detection logic. This design enables retrieval and generation across arbitrary contexts and target languages within a single system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) show that UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52, validating semantic projection as an effective abstraction for unified rule generation. Together, the formalization, method, and evaluation provide an initial framework for studying detection rule generation as a unified task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniRule's unified formalization and dual-projection RAG is a clean new framing for rule generation, but the experiments stay too narrow to confirm it works beyond the tested cases.

read the letter

UniRule formalizes detection rule generation as one mapping f from context and language to rules, then uses dual semantic projections for intent and logic inside an agentic RAG setup. That single-system approach is not how most prior pipelines were built, so the abstraction itself is the clearest step forward here. The experiments add some weight: 12 scenarios across 3 languages and 4 context types, 12,000 pairwise comparisons, and a Bradley-Terry coefficient of 0.52 favoring UniRule over plain LLM output. That is a measurable edge on the reported scale and shows the retrieval step can help in practice for those specific inputs. The soft spot is the unexamined claim that the projections stay faithful for arbitrary contexts and languages. Only four context types were tried, so nothing tests what happens when a new construct falls outside the embedding vocabulary or when retrieval misses critical details. The abstract also gives no information on how the 12k pairs were constructed, what the baseline implementations actually did, or whether any significance testing was run, which leaves the win harder to interpret. This paper is for security engineers and researchers who maintain detection rules across platforms and want to cut down on per-language pipelines. A reader already working on rule automation would get a usable starting framework and some comparative numbers to build on. I would send it to peer review. The formalization is worth referee time even if the current evidence needs tightening on generality and reproducibility.

Referee Report

1 major / 0 minor

Summary. The paper formalizes detection rule generation as a unified mapping f:C*L->R characterized by semantic distance. It proposes UniRule, an agentic RAG framework using dual semantic projection spaces (detection intent and detection logic) to handle arbitrary contexts and target languages in one system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) claim UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52.

Significance. If validated, this provides a useful initial framework for treating detection rule generation as a single unified task rather than fragmented per-task pipelines. The formalization and scale of the human/AI comparison are strengths, though the evaluation lacks necessary methodological details.

major comments (1)

[Experiments] Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the single major comment below and will incorporate the requested details into a revised version of the paper.

read point-by-point responses

Referee: Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.

Authors: We agree that the current manuscript does not include sufficient methodological details to fully substantiate the experimental claims. In the revised version, we will expand the Experiments section with: (1) explicit descriptions of the baseline (pure LLM generation without agentic RAG or dual projections), (2) the procedure for constructing the 12,000 pairwise comparisons, including how scenarios were sampled across the 3 languages and 4 context types, (3) the statistical testing approach used to evaluate the Bradley-Terry coefficient of 0.52 (including any significance tests or confidence intervals), and (4) inter-rater agreement metrics for the human/AI preference judgments. These additions will directly address the concern and strengthen the central outperformance result. revision: yes

Circularity Check

0 steps flagged

No circularity: formalization and external evaluation are independent

full rationale

The paper defines the unified mapping f:C*L->R and introduces dual semantic projection spaces as the basis for UniRule, then reports outperformance via Bradley-Terry ranking on 12,000 external pairwise judgments. No equation or claim reduces the superiority result to an internal fit, self-citation chain, or definitional tautology; the validation metric is computed from independent judgments rather than from parameters fitted inside the projection or retrieval components. The limited scope of the 12 scenarios is a generalization risk but does not create circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that semantic distance in the two projection spaces is sufficient to characterize optimal rules and that an agentic RAG system can operationalize retrieval and generation without task-specific engineering.

axioms (1)

domain assumption Optimal detection rules are characterized by semantic distance in dual projection spaces of intent and logic.
Invoked when the problem is formalized as f:C*L->R and when semantic projection is said to validate the approach.

invented entities (1)

Dual semantic projection spaces (detection intent and detection logic) no independent evidence
purpose: To decouple context from target language so that a single retrieval-generation pipeline works across all combinations.
Newly introduced architectural component with no independent evidence supplied beyond the reported experiments.

pith-pipeline@v0.9.0 · 5449 in / 1386 out tokens · 72740 ms · 2026-05-10T16:22:03.977132+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 7 canonical work pages · 4 internal anchors

[1]

In: 2024 IEEE International Conference on Big Data (BigData)

Balasubramanian, P., Ali, T., Salmani, M., KhoshKholgh, D., Kostakos, P.: Hex2sign: Automatic ids signature generation from hexadecimal data using llms. In: 2024 IEEE International Conference on Big Data (BigData). pp. 4524–4532. IEEE (2024)

2024
[2]

the method of paired comparisons

Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika39(3/4), 324–345 (1952)

1952
[3]

In: Forty-first International Conference on Machine Learning (2024)

Chiang, W.L., Zheng, L., Sheng, Y., Angelopoulos, A.N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M., Gonzalez, J.E., et al.: Chatbot arena: An open platform for evaluating llms by human preference. In: Forty-first International Conference on Machine Learning (2024)

2024
[4]

huber sandwich estimator

Freedman, D.A.: On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician60(4), 299–302 (2006)

2006
[5]

In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Hu, X., Chen, H., Bao, H., Wang, W., Liu, F., Zhou, G., Yin, P.: A llm-based agent for the automatic generation and generalization of ids rules. In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). pp. 1875–1880. IEEE (2024)

2024
[6]

Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K.: Swe-bench: Can language models resolve real-world github issues? (2024),https: //arxiv.org/abs/2310.06770

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

biometrics pp

Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. biometrics pp. 159–174 (1977)

1977
[8]

Li, J., Chai, Y., Du, L., Duan, C., Yan, H., Gu, Z.: Gridai: Generating and repairing intrusion detection rules via collaboration among multiple llm-based agents (2025), https://arxiv.org/abs/2510.13257 From Context to Rules: Toward Unified Detection Rule Generation 15

work page arXiv 2025
[9]

In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security

Li, S., Ming, J., Qiu, P., Chen, Q., Liu, L., Bao, H., Wang, Q., Jia, C.: Packgenome: Automatically generating robust yara rules for accurate malware packer detection. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security. pp. 3078–3092 (2023)

2023
[10]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., Wu, B., Zhang, B., Lin, C., Dong, C., et al.: Deepseek-v3. 2: Pushing the frontier of open large language models. arXiv preprint arXiv:2512.02556 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Merrill, M.A., Shaw, A.G., Carlini, N., Li, B., Raj, H., Bercovich, I., Shi, L., Shin, J.Y., Walshe, T., Buchanan, E.K., Shen, J., Ye, G., Lin, H., Poulos, J., Wang, M., Nezhurina, M., Jitsev, J., Lu, D., Mastromichalakis, O.M., Xu, Z., Chen, Z., Liu, Y., Zhang, R., Chen, L.L., Kashyap, A., Uslu, J.L., Li, J., Wu, J., Yan, M., Bian, S., Sharma, V., Sun, K...

work page internal anchor Pith review arXiv 2026
[12]

arXiv preprint arXiv:2508.18684 (2025)

Mitra, S., Bazarov, A., Duclos, M., Mittal, S., Piplai, A., Rahman, M.R., Zieglar, E., Rahimi, S.: Falcon: Autonomous cyber threat intelligence mining with llms for ids rule generation. arXiv preprint arXiv:2508.18684 (2025)

work page arXiv 2025
[13]

In: 2005 IEEE Symposium on Security and Privacy (S&P’05)

Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signa- tures for polymorphic worms. In: 2005 IEEE Symposium on Security and Privacy (S&P’05). pp. 226–241. IEEE (2005)

2005
[14]

In: Proceedings of the ACM on Web Conference 2025

Schwartz, Y., Ben-Shimol, L., Mimran, D., Elovici, Y., Shabtai, A.: Llmcloud- hunter: Harnessing llms for automated extraction of detection rules from cloud- based cti. In: Proceedings of the ACM on Web Conference 2025. pp. 1922–1941 (2025)

2025
[15]

In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses

Stevens, K., Erdemir, M., Zhang, H., Kim, T., Pearce, P.: Blueprint: Automatic malware signature generation for internet scanning. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses. pp. 197–214 (2024)

2024
[16]

In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications

Tan, H.C., Cheh, C., Chen, B.: Cotoru: automatic generation of network intru- sion detection rules from code. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications. pp. 720–729. IEEE (2022)

2022
[17]

In: 33rd USENIX Security Symposium (USENIX Security 24)

Uetz, R., Herzog, M., Hackländer, L., Schwarz, S., Henze, M.: You cannot escape me: Detecting evasions of{SIEM}rules in enterprise networks. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 5179–5196 (2024)

2024
[18]

Xu, M., Wang, H., Liu, J., Li, X., Yu, Z., Han, W., Lim, H.W., Dong, J.S., Zhang, J.: Threatpilot: Attack-driven threat intelligence extraction (2025),https://arxi v.org/abs/2412.10872

work page arXiv 2025
[19]

In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Zhang, X., Du, X., Chen, H., He, Y., Niu, W., Li, Q.: Automatically generating rules of malicious software packages via large language model. In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 734–747. IEEE (2025)

2025
[20]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.: Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025