pith. machine review for the scientific record. sign in

arxiv: 2604.11078 · v1 · submitted 2026-04-13 · 💻 cs.CR

Recognition: unknown

From Context to Rules: Toward Unified Detection Rule Generation

Baoxu Liu, Cheng Meng, Fangli Ren, Qiuyun Wang, Wenxin Le, Xinyi Li, Zhengwei Jiang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:22 UTC · model grok-4.3

classification 💻 cs.CR
keywords detection rule generationsemantic projectionRAG frameworkunified mappingcybersecurity rulesLLM generationagentic systemrule optimization
0
0 comments X

The pith

Dual semantic projections let one framework generate detection rules from any context in any language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to replace separate pipelines for each context-language pair with a single mapping that turns contexts and target languages into detection rules. It does this by projecting inputs into two separate semantic spaces—one for detection intent and one for detection logic—then using an agentic RAG system to retrieve and generate rules. A sympathetic reader would care because current methods require custom engineering for every new input type or output language, while this abstraction promises one system that works across all of them. The experiments measure the gain through thousands of pairwise human comparisons.

Core claim

Detection rule generation can be formalized as the unified function f mapping from context C and language L to rules R, with optimal rules defined by minimal semantic distance; UniRule realizes this by retrieving from dual projection spaces of intent and logic, and the resulting rules are preferred over pure LLM outputs in 12,000 comparisons across three languages and four context types.

What carries the argument

Dual semantic projection spaces that separately encode detection intent and detection logic, allowing retrieval-augmented generation to produce rules for arbitrary inputs inside a single agentic framework.

If this is right

  • Rule generation no longer requires a dedicated pipeline for each input-output combination.
  • Semantic distance in the projected spaces can serve as a general criterion for selecting or ranking rules.
  • The same retrieval and generation steps apply without modification when the input context or output language changes.
  • Performance can be quantified uniformly across scenarios using pairwise preference data and Bradley-Terry modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security teams could maintain fewer custom scripts when writing rules for multiple languages or log formats.
  • The same dual-space design might be tested on other structured generation tasks such as policy or compliance rule creation.
  • If the projections remain lossless, the framework could be extended to generate rules directly from raw data streams rather than pre-processed contexts.

Load-bearing premise

The two semantic spaces can capture every relevant detail of any context and any target language without information loss or the need for extra adjustments.

What would settle it

A new context-language combination where expert raters consistently prefer rules produced by direct LLM prompting over those produced by the dual-projection system in head-to-head comparisons.

Figures

Figures reproduced from arXiv: 2604.11078 by Baoxu Liu, Cheng Meng, Fangli Ren, Qiuyun Wang, Wenxin Le, Xinyi Li, Zhengwei Jiang.

Figure 1
Figure 1. Figure 1: Overview of UniRule. At runtime (left), given a detection context and target language, an LLM agent autonomously retrieves relevant rules as needed and generates the output. Offline (right), heterogeneous source rules are translated into detection intent and detection logic descriptions, then embedded and indexed. The functions I and Cov defined in §3 are central to rule quality but cannot be computed from… view at source ↗
Figure 2
Figure 2. Figure 2: presents the per-scenario breakdown. Of the 12 scenarios, UniRule is significantly positive in 9 and significantly negative in 3, with no non-significant results. All 8 Splunk and Elastic scenarios show significant improvement, with coeffi￾cients ranging from 0.28 to 1.51. These languages capture behaviors (e.g., event counts, field patterns) where semantic retrieval can fill information gaps with transfer… view at source ↗
Figure 3
Figure 3. Figure 3: Semantic decomposition of a Splunk rule detecting double-extension files (ID: b06a555e-dce0-417d-a2eb-28a5d8d66ef7). The rule is translated into detection intent (threat semantics) and detection logic (technical patterns). Bold text shows summaries; gray text shows full descriptions [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: UniRule reasoning trace and output comparison. Left: the agent retrieves ref￾erence rules from both intent and logic spaces. Right: UniRule generates a more com￾prehensive rule than the Human-Authored alternative. Generation Process [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Existing methods for detection rule generation are tightly coupled to specific input-output combinations, requiring dedicated pipelines for each. We formalize this problem as a unified mapping f:C*L->R and characterize optimal rules through semantic distance. We propose UniRule, an agentic RAG framework built on dual semantic projection spaces: detection intent and detection logic. This design enables retrieval and generation across arbitrary contexts and target languages within a single system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) show that UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52, validating semantic projection as an effective abstraction for unified rule generation. Together, the formalization, method, and evaluation provide an initial framework for studying detection rule generation as a unified task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper formalizes detection rule generation as a unified mapping f:C*L->R characterized by semantic distance. It proposes UniRule, an agentic RAG framework using dual semantic projection spaces (detection intent and detection logic) to handle arbitrary contexts and target languages in one system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) claim UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52.

Significance. If validated, this provides a useful initial framework for treating detection rule generation as a single unified task rather than fragmented per-task pipelines. The formalization and scale of the human/AI comparison are strengths, though the evaluation lacks necessary methodological details.

major comments (1)
  1. [Experiments] Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the single major comment below and will incorporate the requested details into a revised version of the paper.

read point-by-point responses
  1. Referee: Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.

    Authors: We agree that the current manuscript does not include sufficient methodological details to fully substantiate the experimental claims. In the revised version, we will expand the Experiments section with: (1) explicit descriptions of the baseline (pure LLM generation without agentic RAG or dual projections), (2) the procedure for constructing the 12,000 pairwise comparisons, including how scenarios were sampled across the 3 languages and 4 context types, (3) the statistical testing approach used to evaluate the Bradley-Terry coefficient of 0.52 (including any significance tests or confidence intervals), and (4) inter-rater agreement metrics for the human/AI preference judgments. These additions will directly address the concern and strengthen the central outperformance result. revision: yes

Circularity Check

0 steps flagged

No circularity: formalization and external evaluation are independent

full rationale

The paper defines the unified mapping f:C*L->R and introduces dual semantic projection spaces as the basis for UniRule, then reports outperformance via Bradley-Terry ranking on 12,000 external pairwise judgments. No equation or claim reduces the superiority result to an internal fit, self-citation chain, or definitional tautology; the validation metric is computed from independent judgments rather than from parameters fitted inside the projection or retrieval components. The limited scope of the 12 scenarios is a generalization risk but does not create circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that semantic distance in the two projection spaces is sufficient to characterize optimal rules and that an agentic RAG system can operationalize retrieval and generation without task-specific engineering.

axioms (1)
  • domain assumption Optimal detection rules are characterized by semantic distance in dual projection spaces of intent and logic.
    Invoked when the problem is formalized as f:C*L->R and when semantic projection is said to validate the approach.
invented entities (1)
  • Dual semantic projection spaces (detection intent and detection logic) no independent evidence
    purpose: To decouple context from target language so that a single retrieval-generation pipeline works across all combinations.
    Newly introduced architectural component with no independent evidence supplied beyond the reported experiments.

pith-pipeline@v0.9.0 · 5449 in / 1386 out tokens · 72740 ms · 2026-05-10T16:22:03.977132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 7 canonical work pages · 4 internal anchors

  1. [1]

    In: 2024 IEEE International Conference on Big Data (BigData)

    Balasubramanian, P., Ali, T., Salmani, M., KhoshKholgh, D., Kostakos, P.: Hex2sign: Automatic ids signature generation from hexadecimal data using llms. In: 2024 IEEE International Conference on Big Data (BigData). pp. 4524–4532. IEEE (2024)

  2. [2]

    the method of paired comparisons

    Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika39(3/4), 324–345 (1952)

  3. [3]

    In: Forty-first International Conference on Machine Learning (2024)

    Chiang, W.L., Zheng, L., Sheng, Y., Angelopoulos, A.N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M., Gonzalez, J.E., et al.: Chatbot arena: An open platform for evaluating llms by human preference. In: Forty-first International Conference on Machine Learning (2024)

  4. [4]

    huber sandwich estimator

    Freedman, D.A.: On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician60(4), 299–302 (2006)

  5. [5]

    In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

    Hu, X., Chen, H., Bao, H., Wang, W., Liu, F., Zhou, G., Yin, P.: A llm-based agent for the automatic generation and generalization of ids rules. In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). pp. 1875–1880. IEEE (2024)

  6. [6]

    Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K.: Swe-bench: Can language models resolve real-world github issues? (2024),https: //arxiv.org/abs/2310.06770

  7. [7]

    biometrics pp

    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. biometrics pp. 159–174 (1977)

  8. [8]

    Li, J., Chai, Y., Du, L., Duan, C., Yan, H., Gu, Z.: Gridai: Generating and repairing intrusion detection rules via collaboration among multiple llm-based agents (2025), https://arxiv.org/abs/2510.13257 From Context to Rules: Toward Unified Detection Rule Generation 15

  9. [9]

    In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security

    Li, S., Ming, J., Qiu, P., Chen, Q., Liu, L., Bao, H., Wang, Q., Jia, C.: Packgenome: Automatically generating robust yara rules for accurate malware packer detection. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security. pp. 3078–3092 (2023)

  10. [10]

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., Wu, B., Zhang, B., Lin, C., Dong, C., et al.: Deepseek-v3. 2: Pushing the frontier of open large language models. arXiv preprint arXiv:2512.02556 (2025)

  11. [11]

    Merrill, M.A., Shaw, A.G., Carlini, N., Li, B., Raj, H., Bercovich, I., Shi, L., Shin, J.Y., Walshe, T., Buchanan, E.K., Shen, J., Ye, G., Lin, H., Poulos, J., Wang, M., Nezhurina, M., Jitsev, J., Lu, D., Mastromichalakis, O.M., Xu, Z., Chen, Z., Liu, Y., Zhang, R., Chen, L.L., Kashyap, A., Uslu, J.L., Li, J., Wu, J., Yan, M., Bian, S., Sharma, V., Sun, K...

  12. [12]

    arXiv preprint arXiv:2508.18684 (2025)

    Mitra, S., Bazarov, A., Duclos, M., Mittal, S., Piplai, A., Rahman, M.R., Zieglar, E., Rahimi, S.: Falcon: Autonomous cyber threat intelligence mining with llms for ids rule generation. arXiv preprint arXiv:2508.18684 (2025)

  13. [13]

    In: 2005 IEEE Symposium on Security and Privacy (S&P’05)

    Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signa- tures for polymorphic worms. In: 2005 IEEE Symposium on Security and Privacy (S&P’05). pp. 226–241. IEEE (2005)

  14. [14]

    In: Proceedings of the ACM on Web Conference 2025

    Schwartz, Y., Ben-Shimol, L., Mimran, D., Elovici, Y., Shabtai, A.: Llmcloud- hunter: Harnessing llms for automated extraction of detection rules from cloud- based cti. In: Proceedings of the ACM on Web Conference 2025. pp. 1922–1941 (2025)

  15. [15]

    In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses

    Stevens, K., Erdemir, M., Zhang, H., Kim, T., Pearce, P.: Blueprint: Automatic malware signature generation for internet scanning. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses. pp. 197–214 (2024)

  16. [16]

    In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications

    Tan, H.C., Cheh, C., Chen, B.: Cotoru: automatic generation of network intru- sion detection rules from code. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications. pp. 720–729. IEEE (2022)

  17. [17]

    In: 33rd USENIX Security Symposium (USENIX Security 24)

    Uetz, R., Herzog, M., Hackländer, L., Schwarz, S., Henze, M.: You cannot escape me: Detecting evasions of{SIEM}rules in enterprise networks. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 5179–5196 (2024)

  18. [18]

    Xu, M., Wang, H., Liu, J., Li, X., Yu, Z., Han, W., Lim, H.W., Dong, J.S., Zhang, J.: Threatpilot: Attack-driven threat intelligence extraction (2025),https://arxi v.org/abs/2412.10872

  19. [19]

    In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

    Zhang, X., Du, X., Chen, H., He, Y., Niu, W., Li, Q.: Automatically generating rules of malicious software packages via large language model. In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 734–747. IEEE (2025)

  20. [20]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.: Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176 (2025)