Recognition: unknown
From Context to Rules: Toward Unified Detection Rule Generation
Pith reviewed 2026-05-10 16:22 UTC · model grok-4.3
The pith
Dual semantic projections let one framework generate detection rules from any context in any language.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Detection rule generation can be formalized as the unified function f mapping from context C and language L to rules R, with optimal rules defined by minimal semantic distance; UniRule realizes this by retrieving from dual projection spaces of intent and logic, and the resulting rules are preferred over pure LLM outputs in 12,000 comparisons across three languages and four context types.
What carries the argument
Dual semantic projection spaces that separately encode detection intent and detection logic, allowing retrieval-augmented generation to produce rules for arbitrary inputs inside a single agentic framework.
If this is right
- Rule generation no longer requires a dedicated pipeline for each input-output combination.
- Semantic distance in the projected spaces can serve as a general criterion for selecting or ranking rules.
- The same retrieval and generation steps apply without modification when the input context or output language changes.
- Performance can be quantified uniformly across scenarios using pairwise preference data and Bradley-Terry modeling.
Where Pith is reading between the lines
- Security teams could maintain fewer custom scripts when writing rules for multiple languages or log formats.
- The same dual-space design might be tested on other structured generation tasks such as policy or compliance rule creation.
- If the projections remain lossless, the framework could be extended to generate rules directly from raw data streams rather than pre-processed contexts.
Load-bearing premise
The two semantic spaces can capture every relevant detail of any context and any target language without information loss or the need for extra adjustments.
What would settle it
A new context-language combination where expert raters consistently prefer rules produced by direct LLM prompting over those produced by the dual-projection system in head-to-head comparisons.
Figures
read the original abstract
Existing methods for detection rule generation are tightly coupled to specific input-output combinations, requiring dedicated pipelines for each. We formalize this problem as a unified mapping f:C*L->R and characterize optimal rules through semantic distance. We propose UniRule, an agentic RAG framework built on dual semantic projection spaces: detection intent and detection logic. This design enables retrieval and generation across arbitrary contexts and target languages within a single system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) show that UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52, validating semantic projection as an effective abstraction for unified rule generation. Together, the formalization, method, and evaluation provide an initial framework for studying detection rule generation as a unified task.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes detection rule generation as a unified mapping f:C*L->R characterized by semantic distance. It proposes UniRule, an agentic RAG framework using dual semantic projection spaces (detection intent and detection logic) to handle arbitrary contexts and target languages in one system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) claim UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52.
Significance. If validated, this provides a useful initial framework for treating detection rule generation as a single unified task rather than fragmented per-task pipelines. The formalization and scale of the human/AI comparison are strengths, though the evaluation lacks necessary methodological details.
major comments (1)
- [Experiments] Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address the single major comment below and will incorporate the requested details into a revised version of the paper.
read point-by-point responses
-
Referee: Experiments section: the abstract reports a Bradley-Terry coefficient of 0.52 and 12,000 pairwise comparisons but provides no information on baseline implementations, how the pairs were constructed, statistical testing, or inter-rater agreement; this information is required to substantiate the central outperformance claim.
Authors: We agree that the current manuscript does not include sufficient methodological details to fully substantiate the experimental claims. In the revised version, we will expand the Experiments section with: (1) explicit descriptions of the baseline (pure LLM generation without agentic RAG or dual projections), (2) the procedure for constructing the 12,000 pairwise comparisons, including how scenarios were sampled across the 3 languages and 4 context types, (3) the statistical testing approach used to evaluate the Bradley-Terry coefficient of 0.52 (including any significance tests or confidence intervals), and (4) inter-rater agreement metrics for the human/AI preference judgments. These additions will directly address the concern and strengthen the central outperformance result. revision: yes
Circularity Check
No circularity: formalization and external evaluation are independent
full rationale
The paper defines the unified mapping f:C*L->R and introduces dual semantic projection spaces as the basis for UniRule, then reports outperformance via Bradley-Terry ranking on 12,000 external pairwise judgments. No equation or claim reduces the superiority result to an internal fit, self-citation chain, or definitional tautology; the validation metric is computed from independent judgments rather than from parameters fitted inside the projection or retrieval components. The limited scope of the 12 scenarios is a generalization risk but does not create circularity in the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Optimal detection rules are characterized by semantic distance in dual projection spaces of intent and logic.
invented entities (1)
-
Dual semantic projection spaces (detection intent and detection logic)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
In: 2024 IEEE International Conference on Big Data (BigData)
Balasubramanian, P., Ali, T., Salmani, M., KhoshKholgh, D., Kostakos, P.: Hex2sign: Automatic ids signature generation from hexadecimal data using llms. In: 2024 IEEE International Conference on Big Data (BigData). pp. 4524–4532. IEEE (2024)
2024
-
[2]
the method of paired comparisons
Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika39(3/4), 324–345 (1952)
1952
-
[3]
In: Forty-first International Conference on Machine Learning (2024)
Chiang, W.L., Zheng, L., Sheng, Y., Angelopoulos, A.N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M., Gonzalez, J.E., et al.: Chatbot arena: An open platform for evaluating llms by human preference. In: Forty-first International Conference on Machine Learning (2024)
2024
-
[4]
huber sandwich estimator
Freedman, D.A.: On the so-called “huber sandwich estimator” and “robust standard errors”. The American Statistician60(4), 299–302 (2006)
2006
-
[5]
In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
Hu, X., Chen, H., Bao, H., Wang, W., Liu, F., Zhou, G., Yin, P.: A llm-based agent for the automatic generation and generalization of ids rules. In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). pp. 1875–1880. IEEE (2024)
2024
-
[6]
Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K.: Swe-bench: Can language models resolve real-world github issues? (2024),https: //arxiv.org/abs/2310.06770
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
biometrics pp
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. biometrics pp. 159–174 (1977)
1977
- [8]
-
[9]
In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security
Li, S., Ming, J., Qiu, P., Chen, Q., Liu, L., Bao, H., Wang, Q., Jia, C.: Packgenome: Automatically generating robust yara rules for accurate malware packer detection. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Commu- nications Security. pp. 3078–3092 (2023)
2023
-
[10]
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., Wu, B., Zhang, B., Lin, C., Dong, C., et al.: Deepseek-v3. 2: Pushing the frontier of open large language models. arXiv preprint arXiv:2512.02556 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Merrill, M.A., Shaw, A.G., Carlini, N., Li, B., Raj, H., Bercovich, I., Shi, L., Shin, J.Y., Walshe, T., Buchanan, E.K., Shen, J., Ye, G., Lin, H., Poulos, J., Wang, M., Nezhurina, M., Jitsev, J., Lu, D., Mastromichalakis, O.M., Xu, Z., Chen, Z., Liu, Y., Zhang, R., Chen, L.L., Kashyap, A., Uslu, J.L., Li, J., Wu, J., Yan, M., Bian, S., Sharma, V., Sun, K...
work page internal anchor Pith review arXiv 2026
-
[12]
arXiv preprint arXiv:2508.18684 (2025)
Mitra, S., Bazarov, A., Duclos, M., Mittal, S., Piplai, A., Rahman, M.R., Zieglar, E., Rahimi, S.: Falcon: Autonomous cyber threat intelligence mining with llms for ids rule generation. arXiv preprint arXiv:2508.18684 (2025)
-
[13]
In: 2005 IEEE Symposium on Security and Privacy (S&P’05)
Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signa- tures for polymorphic worms. In: 2005 IEEE Symposium on Security and Privacy (S&P’05). pp. 226–241. IEEE (2005)
2005
-
[14]
In: Proceedings of the ACM on Web Conference 2025
Schwartz, Y., Ben-Shimol, L., Mimran, D., Elovici, Y., Shabtai, A.: Llmcloud- hunter: Harnessing llms for automated extraction of detection rules from cloud- based cti. In: Proceedings of the ACM on Web Conference 2025. pp. 1922–1941 (2025)
2025
-
[15]
In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses
Stevens, K., Erdemir, M., Zhang, H., Kim, T., Pearce, P.: Blueprint: Automatic malware signature generation for internet scanning. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses. pp. 197–214 (2024)
2024
-
[16]
In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications
Tan, H.C., Cheh, C., Chen, B.: Cotoru: automatic generation of network intru- sion detection rules from code. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications. pp. 720–729. IEEE (2022)
2022
-
[17]
In: 33rd USENIX Security Symposium (USENIX Security 24)
Uetz, R., Herzog, M., Hackländer, L., Schwarz, S., Henze, M.: You cannot escape me: Detecting evasions of{SIEM}rules in enterprise networks. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 5179–5196 (2024)
2024
- [18]
-
[19]
In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Zhang, X., Du, X., Chen, H., He, Y., Niu, W., Li, Q.: Automatically generating rules of malicious software packages via large language model. In: 2025 55th An- nual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 734–747. IEEE (2025)
2025
-
[20]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.: Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.