pith. machine review for the scientific record. sign in

arxiv: 2604.06762 · v1 · submitted 2026-04-08 · 💻 cs.CR

Recognition: unknown

ARuleCon: Agentic Security Rule Conversion

Hongtai Wang, Hoon Wei Lim, Jiaheng Zhang, Jin Song Dong, Ming Xu, Weili Han, Yanpei Guo, Zhengmin Yu

Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3

classification 💻 cs.CR
keywords SIEMrule conversionagentic systemssecurity rulescross-platformintrusion detectionSPLKQL
0
0 comments X

The pith

ARuleCon converts SIEM security rules between vendor formats using agents and runtime checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Security rules written for one SIEM platform cannot be directly reused on another because each vendor uses its own query language and schema. ARuleCon addresses this by deploying agents that translate the rule logic while managing format and schema differences, then confirms correctness by executing both the original and converted rules side by side in controlled test environments. The method removes the need for experts to manually reinterpret rules or study target documentation. If successful, it preserves years of invested detection logic and lowers the cost of switching or operating multiple SIEM systems.

Core claim

ARuleCon is an agentic framework that converts SIEM rules such as Splunk SPL into targets like Microsoft KQL by handling conversion and schema mismatches automatically, then applies a Python-based consistency check that runs both source and target rules against test data to catch semantic drifts, achieving 15 percent higher average fidelity than direct LLM prompting in textual alignment and execution success.

What carries the argument

Agentic pipeline that combines LLM-driven translation with mismatch resolution and parallel execution verification in sandboxed test environments.

If this is right

  • Security teams can reuse existing detection rules when migrating between SIEM platforms.
  • The time required to understand cross-vendor documentation and remap logic drops substantially.
  • Rule conversion success rates rise compared with direct use of large language models.
  • Proven detection logic retains its value across heterogeneous environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent-plus-verification pattern could be tested on additional SIEM languages beyond the ones evaluated.
  • Embedding the consistency checks into live data pipelines might allow continuous validation after deployment.
  • Similar methods may help translate other security artifacts such as alert playbooks or compliance queries.

Load-bearing premise

That executing the source and converted rules against the same test data will reliably expose any important behavioral differences.

What would settle it

A set of real-world events where the converted rule produces a different set of alerts than the original rule despite both passing the consistency checks.

Figures

Figures reproduced from arXiv: 2604.06762 by Hongtai Wang, Hoon Wei Lim, Jiaheng Zhang, Jin Song Dong, Ming Xu, Weili Han, Yanpei Guo, Zhengmin Yu.

Figure 1
Figure 1. Figure 1: Motivation Scenarios [29, 40]: industrial SOCs are plagued by numerous rule query languages that share the backbones of first order logic. is overloaded and can simultaneously express grouping, filtering, and thresholding in a single clause like: events : filter : event . type = " login_failure " aggregate : count () > 5 by src_ip within 30 m Listing 1: This compact rule states that login failure events sh… view at source ↗
Figure 2
Figure 2. Figure 2: Overview Pipeline of ARuleCon. As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Each component contributes to the performance. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fine-grained conversion evaluation on semantic-level: redder regions highlight systematic improvements, and blue [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Consistency between LLM-as-a-Judge and human [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study of semantic metrics on ARuleCon. as execution mismatches cannot be detected. Overall, the results in￾dicate that each component contributes complementary strengths, and their combination is crucial for robust semantic consistency. C.3 Case Analysis Nested Query Structures. A common difficulty in cross-SIEM rule translation emerges when rules contain nested query logic. Splunk allows such pat… view at source ↗
read the original abstract

Security Information and Event Management (SIEM) systems make it possible for detecting intrusion anomalies in real-time manner by their applied security rules. However, the heterogeneity of vendor-specific rules (e.g., Splunk SPL, Microsoft KQL, IBM AQL, Google YARA-L, and RSA ESA) makes cross-platform rule reuse extremely difficult, requiring deep domain knowledge for reliable conversion. As a result, an autonomous and accurate rule conversion framework can significantly lead to effort savings, preserving the value of existing rules. In this paper, we propose ARuleCon, an agentic SIEM-rule conversion approach. Using ARuleCon, the security professionals do not need to distill the source rules' logic, the documentation of the target rules and ARuleCon can purposely convert to the target vendors without more intervention. To achieve this, ARuleCon is equipped with conversion/schema mismatches, and Python-based consistency check that running both source and target rules in controlled test environments to mitigate subtle semantic drifts. We present a comprehensive evaluation of ARuleCon ranging from textual alignment and the execution success, showcasing ARuleCon can convert rules with high fidelity, outperforming the baseline LLM model by 15% averagely. Finally, we perform case studies and interview with our industry collaborators in Singtel Singapore, which showcases that ARuleCon can significantly save expert's time on understanding cross-SIEM's documentation and remapping logic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ARuleCon, an agentic framework for converting SIEM security rules across heterogeneous vendors (Splunk SPL, Microsoft KQL, IBM AQL, Google YARA-L, RSA ESA). It uses LLM-based agents to handle conversion and schema mismatches without requiring users to distill source logic or target documentation, augmented by a Python-based consistency check that executes both source and target rules in controlled test environments. The central claims are high-fidelity conversion (textual alignment plus execution success) that outperforms a baseline LLM by 15% on average, plus time savings demonstrated via case studies and interviews with Singtel collaborators.

Significance. If the evaluation claims hold under rigorous testing, ARuleCon would offer a practical advance in automating cross-SIEM rule reuse, reducing expert effort in security operations centers where rule heterogeneity currently blocks reuse. The combination of agentic conversion with execution-based verification is a concrete step beyond pure prompting approaches; reproducible code or open test suites would further strengthen its impact.

major comments (2)
  1. [Abstract / Evaluation] Abstract and Evaluation section: the claim of a 15% average improvement in fidelity is presented without any reported test-set size, rule-selection criteria, number of conversion pairs, statistical tests, confidence intervals, or precise baseline prompting strategy (including temperature, few-shot examples, or model version). This prevents verification of the data-to-claim link and makes the performance delta impossible to reproduce or compare.
  2. [ARuleCon framework / Evaluation] Consistency-check description (Python-based execution in controlled environments): the method assumes that running source and target rules on the same test data will detect semantic drifts, yet the manuscript provides no evidence that the test scenarios cover edge cases such as null-value handling, time-window boundary conditions, aggregation over sparse logs, or vendor-specific schema quirks. Without such coverage metrics or failure-case analysis, the high-fidelity claim rests on an unverified assumption.
minor comments (2)
  1. [Introduction] Notation for rule languages (SPL, KQL, etc.) is introduced without a compact comparison table; adding one would improve readability for readers unfamiliar with all five dialects.
  2. [Case studies] The industry-interview protocol and participant count are mentioned only in passing; a brief description of interview questions or anonymized quotes would strengthen the qualitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have addressed each major comment below and revised the paper to improve the transparency and rigor of the evaluation.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim of a 15% average improvement in fidelity is presented without any reported test-set size, rule-selection criteria, number of conversion pairs, statistical tests, confidence intervals, or precise baseline prompting strategy (including temperature, few-shot examples, or model version). This prevents verification of the data-to-claim link and makes the performance delta impossible to reproduce or compare.

    Authors: We agree that these experimental details were insufficiently reported in the original manuscript, which limits reproducibility. We have revised the Evaluation section to include the test-set size and rule-selection criteria, the number of conversion pairs evaluated, the statistical tests and confidence intervals used, and the precise baseline prompting strategy (model version, temperature, and few-shot examples). The test rules and prompts have also been added to the supplementary material to enable independent verification of the 15% average improvement. revision: yes

  2. Referee: [ARuleCon framework / Evaluation] Consistency-check description (Python-based execution in controlled environments): the method assumes that running source and target rules on the same test data will detect semantic drifts, yet the manuscript provides no evidence that the test scenarios cover edge cases such as null-value handling, time-window boundary conditions, aggregation over sparse logs, or vendor-specific schema quirks. Without such coverage metrics or failure-case analysis, the high-fidelity claim rests on an unverified assumption.

    Authors: We acknowledge that the original consistency-check description lacked explicit discussion of edge-case coverage and failure analysis. We have revised the manuscript to add a dedicated subsection describing the test scenarios, which now explicitly include null-value handling, time-window boundaries, sparse log aggregation, and vendor-specific schema differences. We also include a failure-case analysis with concrete examples of detected semantic drifts. While exhaustive coverage of all possible edge cases is not feasible, these additions provide evidence supporting the check's effectiveness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external execution checks

full rationale

The paper's central claims of high-fidelity rule conversion (15% average improvement over baseline LLM) are supported by empirical evaluation metrics: textual alignment, execution success rates, and a Python-based consistency check that executes both source and target rules in controlled test environments. These steps constitute external validation against observable behavior rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. No equations, ansatzes, or uniqueness theorems are invoked that reduce the result to the inputs by construction. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unproven assumption that current LLMs plus execution tests can faithfully preserve rule semantics across vendor languages; this is a domain assumption about AI capability rather than a derived result.

axioms (2)
  • domain assumption Large language models can accurately parse and rewrite security rule logic across heterogeneous vendor syntaxes without human intervention.
    Invoked when the paper states that professionals do not need to distill source logic or study target documentation.
  • domain assumption Execution of rules in controlled Python environments can detect and correct subtle semantic drifts introduced during conversion.
    Central to the consistency-check component described in the abstract.
invented entities (1)
  • ARuleCon agentic conversion framework no independent evidence
    purpose: Autonomous translation of SIEM rules with built-in verification
    The proposed system itself; no independent falsifiable evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5570 in / 1470 out tokens · 65299 ms · 2026-05-10T17:37:38.885904+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 13 canonical work pages · 4 internal anchors

  1. [1]

    meta-llama/Llama-3.1-405B·Hugging Face — huggingface.co

    2022. meta-llama/Llama-3.1-405B·Hugging Face — huggingface.co. https: //huggingface.co/meta-llama/Llama-3.1-405B

  2. [2]

    text-embedding-ada-002

    2022. text-embedding-ada-002. https://platform.openai.com/docs/guides/ embeddings/what-are-embeddings

  3. [3]

    deepseek-ai/DeepSeek-V3·Hugging Face — huggingface.co

    2024. deepseek-ai/DeepSeek-V3·Hugging Face — huggingface.co. https:// huggingface.co/deepseek-ai/DeepSeek-V3

  4. [4]

    Andi Albrecht and contributors. [n. d.]. sqlparse: non-validating SQL parser for Python. https://pypi.org/project/sqlparse/

  5. [5]

    Ariel Query Language Guide [n. d.]. https://www.ibm.com/docs/en/SS42VS_7.4/ pdf/b_qradar_aql.pdf. Ariel Query Language Guide

  6. [6]

    Bhatt, P

    Sandeep N. Bhatt, Pratyusa K. Manadhata, and Loai Zomlot. 2014. The Operational Role of Security Information and Event Management Systems.IEEE Secur. Priv. 12, 5 (2014), 35–41. doi:10.1109/MSP.2014.103

  7. [7]

    Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, and Xueyuan Han. 2023. Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance.CoRRabs/2308.05034 (2023). arXiv:2308.05034 doi:10.48550/ARXIV.2308.05034

  8. [8]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevo...

  9. [9]

    Google Cloud [n. d.]. https://cloud.google.com/chronicle/docs/detection/yara-l- 2-0-syntax. YARA-L 2.0 language syntax

  10. [10]

    Google Security Operations (SecOps) [n. d.]. https://cloud.google.com/security/ products/security-operations?hl=en. Google SecOps’ cloud-native security operations platform empowers security teams to better detect, investigate, and respond to cybersecurity threats

  11. [11]

    Google YARA-L Rules [n. d.]. https://github.com/chronicle/detection-rules/tree/ main/rules/community. Google YARA-L Rule Dataset

  12. [12]

    Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020. Tactical Provenance Analysis for Endpoint Detection and Response Systems. In2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. IEEE, 1172–1189. doi:10.1109/SP40000.2020.00096

  13. [13]

    IBM. 2024. SPL To KQL Application

  14. [14]

    IBM AQL Rules [n. d.]. https://github.com/Xboarder56/QRCE-Rules. IBM QRadar AQL Rule Dataset

  15. [15]

    Cohen’s Kappa. [n. d.]. A measure of agreement between two dependent cate- gorical samples. https://datatab.net/tutorial/cohens-kappa

  16. [16]

    Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. arXiv:2412.05579 [cs.CL] https://arxiv.org/abs/2412.05579

  17. [17]

    Ruishi Li, Bo Wang, Tianyu Li, Prateek Saxena, and Ashish Kundu. 2025. Trans- lating C To Rust: Lessons from a User Study. In32nd Annual Network and Dis- tributed System Security Symposium, NDSS 2025, San Diego, California, USA, Feb- ruary 24-28, 2025. The Internet Society. https://www.ndss-symposium.org/ndss- paper/translating-c-to-rust-lessons-from-a-us...

  18. [18]

    Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, and Gang Wang

    Jiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, and Gang Wang. 2025. PurpCode: Reasoning for Safer Code Generation. arXiv:2507.19060 [cs.CR] https://arxiv.org/abs/2507. 19060

  19. [19]

    Zhengxiong Luo, Huan Zhao, Dylan Wolff, Cristian Cadar, and Abhik Roychoud- hury. 2026. Agentic Concolic Execution. In2026 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 1–19

  20. [20]

    Microsoft. 2025. Microsoft Sentinel - Cloud-native SIEM Solution | Microsoft Azure — azure.microsoft.com. https://azure.microsoft.com/en-us/products/ microsoft-sentinel

  21. [21]

    Microsoft Ignite [n. d.]. https://learn.microsoft.com/en-us/kusto/query/?view= microsoft-fabric. Kusto Query Language overview

  22. [22]

    Microsoft KQL Rules [n. d.]. https://github.com/Azure/Azure-Sentinel/tree/ master/Detections. Microsoft Sentinel KQL Rule Dataset

  23. [23]

    Amadou Latyr Ngom and Tim Kraska. 2024. Mallet: SQL Dialect Translation with LLM Rule Generation. InProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2024, Santiago, Chile, 14 June 2024. ACM, 3:1–3:5. doi:10.1145/3663742.3663973

  24. [24]

    Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Zhi Jin, and Jin Song Dong. 2025. Intention-driven generation of project-specific test cases.arXiv preprint arXiv:2507.20619(2025)

  25. [25]

    Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis.CoRRabs/2009.10297 (2020). arXiv:2009.10297 https://arxiv.org/abs/2009.10297

  26. [26]

    RSA ESA Rules [n. d.]. https://github.com/netwitness/nw-esa. RSA NetWitness ESA Rule Dataset

  27. [27]

    Rule SyntaxRule Syntax [n. d.]. https://community.netwitness.com/s/article/ RuleSyntax. NetWitness Community

  28. [28]

    2016.Artificial intelligence: a modern approach

    Stuart J Russell and Peter Norvig. 2016.Artificial intelligence: a modern approach. Pearson

  29. [29]

    SIEM Migration [n. d.]. https://cardinalops.com/blog/siem-migration-challenges- strategies/. SIEM Migration: Challenges and Strategies

  30. [30]

    Kirill Simonov and contributors. [n. d.]. PyYAML: YAML parser and emitter for Python. https://pypi.org/project/PyYAML/

  31. [31]

    Splunk [n. d.]. https://help.splunk.com/en/splunk-enterprise/search/spl-search- reference/9.4/introduction/welcome-to-the-search-reference. Splunk Enterprise

  32. [32]

    Splunk cisco company [n. d.]. https://shorturl.at/dgTsP. State of Security 2024: The Race to Harness AI

  33. [33]

    Splunk SPL Rules [n. d.]. https://github.com/splunk/security_content/tree/ develop/detections. Splunk SPL Rule Dataset

  34. [34]

    Threat Report [n. d.]. https://www.cloudwards.net/cyber-security-statistics/ ?utm_source=chatgpt.com. 67 Cyber Security Statistics, Facts and Trends: Data on Attacks, Breaches and Threats for 2025

  35. [35]

    Rafael Uetz, Marco Herzog, Louis Hackländer, Simon Schwarz, and Martin Henze

  36. [36]

    Sapienza, J

    You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks.CoRRabs/2311.10197 (2023). arXiv:2311.10197 doi:10.48550/ARXIV. 2311.10197

  37. [37]

    Mati Ur Rehman, Hadi Ahmadi, and Wajih Ul Hassan. 2024. Flash: A Compre- hensive Approach to Intrusion Detection via Provenance Graph Representa- tion Learning. In2024 IEEE Symposium on Security and Privacy (SP). 3552–3570. doi:10.1109/SP54263.2024.00139

  38. [38]

    VirusTotal. [n. d.]. yara-python: The Python interface for YARA. https://pypi. org/project/yara-python/

  39. [39]

    Hongtai Wang, Ming Xu, Yanpei Guo, Weili Han, Hoon Wei Lim, and Jin Song Dong. 2025. RulePilot: An LLM-Powered Agent for Security Rule Generation. CoRRabs/2511.12224 (2025). arXiv:2511.12224 doi:10.48550/ARXIV.2511.12224

  40. [40]

    Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I. Wang. 2025. SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Soft- ware Evolution. arXiv:2502.18449 [cs.SE] https://arxiv.org/abs/2502.18449

  41. [41]

    Why SIEM Migration [n. d.]. DemystifyingSIEMmigration: Pitfallstoavoidandtipsforensuringsuccess. Demystifying SIEM migration: Pitfalls to avoid and tips for ensuring success

  42. [42]

    Pengfei Wu, Jianting Ning, Wu Luo, Xinyi Huang, and Debiao He. 2021. Exploring dynamic task loading in SGX-based distributed computing.IEEE Transactions on Services Computing16, 1 (2021), 288–301

  43. [43]

    You are a security analyst specializing in Microsoft Sentinel KQL

    Wei Zhou, Guoliang Li, Haoyu Wang, Yuxing Han, Xufei Wu, Fan Wu, and Xuanhe Zhou. 2025. PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation. arXiv:2509.23338 [cs.DB] https://arxiv.org/abs/2509.23338 A Discussion Constraint Translation.Agentic AI offers a promising paradigm for constraint translation (e.g., translating the C programming...