arxiv: 2604.06762 · v1 · submitted 2026-04-08 · 💻 cs.CR

Recognition: unknown

ARuleCon: Agentic Security Rule Conversion

Hongtai Wang, Hoon Wei Lim, Jiaheng Zhang, Jin Song Dong, Ming Xu, Weili Han, Yanpei Guo, Zhengmin Yu

Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3

classification 💻 cs.CR

keywords SIEMrule conversionagentic systemssecurity rulescross-platformintrusion detectionSPLKQL

0 comments

The pith

ARuleCon converts SIEM security rules between vendor formats using agents and runtime checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Security rules written for one SIEM platform cannot be directly reused on another because each vendor uses its own query language and schema. ARuleCon addresses this by deploying agents that translate the rule logic while managing format and schema differences, then confirms correctness by executing both the original and converted rules side by side in controlled test environments. The method removes the need for experts to manually reinterpret rules or study target documentation. If successful, it preserves years of invested detection logic and lowers the cost of switching or operating multiple SIEM systems.

Core claim

ARuleCon is an agentic framework that converts SIEM rules such as Splunk SPL into targets like Microsoft KQL by handling conversion and schema mismatches automatically, then applies a Python-based consistency check that runs both source and target rules against test data to catch semantic drifts, achieving 15 percent higher average fidelity than direct LLM prompting in textual alignment and execution success.

What carries the argument

Agentic pipeline that combines LLM-driven translation with mismatch resolution and parallel execution verification in sandboxed test environments.

If this is right

Security teams can reuse existing detection rules when migrating between SIEM platforms.
The time required to understand cross-vendor documentation and remap logic drops substantially.
Rule conversion success rates rise compared with direct use of large language models.
Proven detection logic retains its value across heterogeneous environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent-plus-verification pattern could be tested on additional SIEM languages beyond the ones evaluated.
Embedding the consistency checks into live data pipelines might allow continuous validation after deployment.
Similar methods may help translate other security artifacts such as alert playbooks or compliance queries.

Load-bearing premise

That executing the source and converted rules against the same test data will reliably expose any important behavioral differences.

What would settle it

A set of real-world events where the converted rule produces a different set of alerts than the original rule despite both passing the consistency checks.

Figures

Figures reproduced from arXiv: 2604.06762 by Hongtai Wang, Hoon Wei Lim, Jiaheng Zhang, Jin Song Dong, Ming Xu, Weili Han, Yanpei Guo, Zhengmin Yu.

**Figure 1.** Figure 1: Motivation Scenarios [29, 40]: industrial SOCs are plagued by numerous rule query languages that share the backbones of first order logic. is overloaded and can simultaneously express grouping, filtering, and thresholding in a single clause like: events : filter : event . type = " login_failure " aggregate : count () > 5 by src_ip within 30 m Listing 1: This compact rule states that login failure events sh… view at source ↗

**Figure 2.** Figure 2: Overview Pipeline of ARuleCon. As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Each component contributes to the performance. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Fine-grained conversion evaluation on semantic-level: redder regions highlight systematic improvements, and blue [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Consistency between LLM-as-a-Judge and human [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: Ablation study of semantic metrics on ARuleCon. as execution mismatches cannot be detected. Overall, the results indicate that each component contributes complementary strengths, and their combination is crucial for robust semantic consistency. C.3 Case Analysis Nested Query Structures. A common difficulty in cross-SIEM rule translation emerges when rules contain nested query logic. Splunk allows such pat… view at source ↗

read the original abstract

Security Information and Event Management (SIEM) systems make it possible for detecting intrusion anomalies in real-time manner by their applied security rules. However, the heterogeneity of vendor-specific rules (e.g., Splunk SPL, Microsoft KQL, IBM AQL, Google YARA-L, and RSA ESA) makes cross-platform rule reuse extremely difficult, requiring deep domain knowledge for reliable conversion. As a result, an autonomous and accurate rule conversion framework can significantly lead to effort savings, preserving the value of existing rules. In this paper, we propose ARuleCon, an agentic SIEM-rule conversion approach. Using ARuleCon, the security professionals do not need to distill the source rules' logic, the documentation of the target rules and ARuleCon can purposely convert to the target vendors without more intervention. To achieve this, ARuleCon is equipped with conversion/schema mismatches, and Python-based consistency check that running both source and target rules in controlled test environments to mitigate subtle semantic drifts. We present a comprehensive evaluation of ARuleCon ranging from textual alignment and the execution success, showcasing ARuleCon can convert rules with high fidelity, outperforming the baseline LLM model by 15% averagely. Finally, we perform case studies and interview with our industry collaborators in Singtel Singapore, which showcases that ARuleCon can significantly save expert's time on understanding cross-SIEM's documentation and remapping logic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARuleCon is a solid engineering pipeline for turning SIEM rules from one vendor language into another with schema checks and execution verification, but the reported gains rest on thin evaluation details.

read the letter

ARuleCon is a practical agentic system that converts rules across SIEM platforms such as Splunk SPL, KQL, AQL, and YARA-L. It adds explicit handling for schema mismatches and runs both the original and converted rules inside Python test environments to catch semantic differences that text matching would miss. The industry interviews with Singtel collaborators show it can cut down the time experts spend on documentation and logic remapping, which is a real operational pain point. That combination of agentic workflow, mismatch detection, and execution verification is the concrete new piece here, even if similar LLM translation ideas exist elsewhere. The paper lays out the pipeline clearly and reports that it beats a baseline LLM by about 15 percent on average while keeping high textual and execution fidelity. The case studies give some evidence that the output rules are usable in practice. The main soft spot is the evaluation. The abstract and summary give the 15 percent figure and mention a comprehensive test, yet they do not state how many rules were used, how they were selected, what the baseline prompting looked like, or whether any statistical checks were applied. That gap makes it hard to know how much the improvement depends on the specific test set or on better engineering of the agent rather than a general advance. The Python consistency check is a reasonable safeguard, but controlled test environments often skip the messy edge cases that matter in production logs, such as null values, sparse aggregations, or vendor-specific time handling. If the paper does not demonstrate broad coverage of those scenarios, the high-fidelity claim stays provisional. This work is aimed at security operations teams and researchers building domain-specific LLM agents. Practitioners who manage multiple SIEM tools will see immediate utility in the described approach. It has enough implemented detail and real-world grounding to deserve peer review, mainly so the evaluation numbers and test coverage can be examined and strengthened.

Referee Report

2 major / 2 minor

Summary. The paper introduces ARuleCon, an agentic framework for converting SIEM security rules across heterogeneous vendors (Splunk SPL, Microsoft KQL, IBM AQL, Google YARA-L, RSA ESA). It uses LLM-based agents to handle conversion and schema mismatches without requiring users to distill source logic or target documentation, augmented by a Python-based consistency check that executes both source and target rules in controlled test environments. The central claims are high-fidelity conversion (textual alignment plus execution success) that outperforms a baseline LLM by 15% on average, plus time savings demonstrated via case studies and interviews with Singtel collaborators.

Significance. If the evaluation claims hold under rigorous testing, ARuleCon would offer a practical advance in automating cross-SIEM rule reuse, reducing expert effort in security operations centers where rule heterogeneity currently blocks reuse. The combination of agentic conversion with execution-based verification is a concrete step beyond pure prompting approaches; reproducible code or open test suites would further strengthen its impact.

major comments (2)

[Abstract / Evaluation] Abstract and Evaluation section: the claim of a 15% average improvement in fidelity is presented without any reported test-set size, rule-selection criteria, number of conversion pairs, statistical tests, confidence intervals, or precise baseline prompting strategy (including temperature, few-shot examples, or model version). This prevents verification of the data-to-claim link and makes the performance delta impossible to reproduce or compare.
[ARuleCon framework / Evaluation] Consistency-check description (Python-based execution in controlled environments): the method assumes that running source and target rules on the same test data will detect semantic drifts, yet the manuscript provides no evidence that the test scenarios cover edge cases such as null-value handling, time-window boundary conditions, aggregation over sparse logs, or vendor-specific schema quirks. Without such coverage metrics or failure-case analysis, the high-fidelity claim rests on an unverified assumption.

minor comments (2)

[Introduction] Notation for rule languages (SPL, KQL, etc.) is introduced without a compact comparison table; adding one would improve readability for readers unfamiliar with all five dialects.
[Case studies] The industry-interview protocol and participant count are mentioned only in passing; a brief description of interview questions or anonymized quotes would strengthen the qualitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have addressed each major comment below and revised the paper to improve the transparency and rigor of the evaluation.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim of a 15% average improvement in fidelity is presented without any reported test-set size, rule-selection criteria, number of conversion pairs, statistical tests, confidence intervals, or precise baseline prompting strategy (including temperature, few-shot examples, or model version). This prevents verification of the data-to-claim link and makes the performance delta impossible to reproduce or compare.

Authors: We agree that these experimental details were insufficiently reported in the original manuscript, which limits reproducibility. We have revised the Evaluation section to include the test-set size and rule-selection criteria, the number of conversion pairs evaluated, the statistical tests and confidence intervals used, and the precise baseline prompting strategy (model version, temperature, and few-shot examples). The test rules and prompts have also been added to the supplementary material to enable independent verification of the 15% average improvement. revision: yes
Referee: [ARuleCon framework / Evaluation] Consistency-check description (Python-based execution in controlled environments): the method assumes that running source and target rules on the same test data will detect semantic drifts, yet the manuscript provides no evidence that the test scenarios cover edge cases such as null-value handling, time-window boundary conditions, aggregation over sparse logs, or vendor-specific schema quirks. Without such coverage metrics or failure-case analysis, the high-fidelity claim rests on an unverified assumption.

Authors: We acknowledge that the original consistency-check description lacked explicit discussion of edge-case coverage and failure analysis. We have revised the manuscript to add a dedicated subsection describing the test scenarios, which now explicitly include null-value handling, time-window boundaries, sparse log aggregation, and vendor-specific schema differences. We also include a failure-case analysis with concrete examples of detected semantic drifts. While exhaustive coverage of all possible edge cases is not feasible, these additions provide evidence supporting the check's effectiveness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external execution checks

full rationale

The paper's central claims of high-fidelity rule conversion (15% average improvement over baseline LLM) are supported by empirical evaluation metrics: textual alignment, execution success rates, and a Python-based consistency check that executes both source and target rules in controlled test environments. These steps constitute external validation against observable behavior rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. No equations, ansatzes, or uniqueness theorems are invoked that reduce the result to the inputs by construction. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unproven assumption that current LLMs plus execution tests can faithfully preserve rule semantics across vendor languages; this is a domain assumption about AI capability rather than a derived result.

axioms (2)

domain assumption Large language models can accurately parse and rewrite security rule logic across heterogeneous vendor syntaxes without human intervention.
Invoked when the paper states that professionals do not need to distill source logic or study target documentation.
domain assumption Execution of rules in controlled Python environments can detect and correct subtle semantic drifts introduced during conversion.
Central to the consistency-check component described in the abstract.

invented entities (1)

ARuleCon agentic conversion framework no independent evidence
purpose: Autonomous translation of SIEM rules with built-in verification
The proposed system itself; no independent falsifiable evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5570 in / 1470 out tokens · 65299 ms · 2026-05-10T17:37:38.885904+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 13 canonical work pages · 4 internal anchors

[1]

meta-llama/Llama-3.1-405B·Hugging Face — huggingface.co

2022. meta-llama/Llama-3.1-405B·Hugging Face — huggingface.co. https: //huggingface.co/meta-llama/Llama-3.1-405B

2022
[2]

text-embedding-ada-002

2022. text-embedding-ada-002. https://platform.openai.com/docs/guides/ embeddings/what-are-embeddings

2022
[3]

deepseek-ai/DeepSeek-V3·Hugging Face — huggingface.co

2024. deepseek-ai/DeepSeek-V3·Hugging Face — huggingface.co. https:// huggingface.co/deepseek-ai/DeepSeek-V3

2024
[4]

Andi Albrecht and contributors. [n. d.]. sqlparse: non-validating SQL parser for Python. https://pypi.org/project/sqlparse/
[5]

Ariel Query Language Guide [n. d.]. https://www.ibm.com/docs/en/SS42VS_7.4/ pdf/b_qradar_aql.pdf. Ariel Query Language Guide
[6]

Bhatt, P

Sandeep N. Bhatt, Pratyusa K. Manadhata, and Loai Zomlot. 2014. The Operational Role of Security Information and Event Management Systems.IEEE Secur. Priv. 12, 5 (2014), 35–41. doi:10.1109/MSP.2014.103

work page doi:10.1109/msp.2014.103 2014
[7]

Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, and Xueyuan Han. 2023. Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance.CoRRabs/2308.05034 (2023). arXiv:2308.05034 doi:10.48550/ARXIV.2308.05034

work page doi:10.48550/arxiv.2308.05034 2023
[8]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevo...

2020
[9]

Google Cloud [n. d.]. https://cloud.google.com/chronicle/docs/detection/yara-l- 2-0-syntax. YARA-L 2.0 language syntax
[10]

Google Security Operations (SecOps) [n. d.]. https://cloud.google.com/security/ products/security-operations?hl=en. Google SecOps’ cloud-native security operations platform empowers security teams to better detect, investigate, and respond to cybersecurity threats
[11]

Google YARA-L Rules [n. d.]. https://github.com/chronicle/detection-rules/tree/ main/rules/community. Google YARA-L Rule Dataset
[12]

Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020. Tactical Provenance Analysis for Endpoint Detection and Response Systems. In2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. IEEE, 1172–1189. doi:10.1109/SP40000.2020.00096

work page doi:10.1109/sp40000.2020.00096 2020
[13]

IBM. 2024. SPL To KQL Application

2024
[14]

IBM AQL Rules [n. d.]. https://github.com/Xboarder56/QRCE-Rules. IBM QRadar AQL Rule Dataset
[15]

Cohen’s Kappa. [n. d.]. A measure of agreement between two dependent cate- gorical samples. https://datatab.net/tutorial/cohens-kappa
[16]

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. arXiv:2412.05579 [cs.CL] https://arxiv.org/abs/2412.05579

work page internal anchor Pith review arXiv 2024
[17]

Ruishi Li, Bo Wang, Tianyu Li, Prateek Saxena, and Ashish Kundu. 2025. Trans- lating C To Rust: Lessons from a User Study. In32nd Annual Network and Dis- tributed System Security Symposium, NDSS 2025, San Diego, California, USA, Feb- ruary 24-28, 2025. The Internet Society. https://www.ndss-symposium.org/ndss- paper/translating-c-to-rust-lessons-from-a-us...

2025
[18]

Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, and Gang Wang

Jiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, and Gang Wang. 2025. PurpCode: Reasoning for Safer Code Generation. arXiv:2507.19060 [cs.CR] https://arxiv.org/abs/2507. 19060

work page arXiv 2025
[19]

Zhengxiong Luo, Huan Zhao, Dylan Wolff, Cristian Cadar, and Abhik Roychoud- hury. 2026. Agentic Concolic Execution. In2026 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 1–19

2026
[20]

Microsoft. 2025. Microsoft Sentinel - Cloud-native SIEM Solution | Microsoft Azure — azure.microsoft.com. https://azure.microsoft.com/en-us/products/ microsoft-sentinel

2025
[21]

Microsoft Ignite [n. d.]. https://learn.microsoft.com/en-us/kusto/query/?view= microsoft-fabric. Kusto Query Language overview
[22]

Microsoft KQL Rules [n. d.]. https://github.com/Azure/Azure-Sentinel/tree/ master/Detections. Microsoft Sentinel KQL Rule Dataset
[23]

Amadou Latyr Ngom and Tim Kraska. 2024. Mallet: SQL Dialect Translation with LLM Rule Generation. InProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2024, Santiago, Chile, 14 June 2024. ACM, 3:1–3:5. doi:10.1145/3663742.3663973

work page doi:10.1145/3663742.3663973 2024
[24]

Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Zhi Jin, and Jin Song Dong. 2025. Intention-driven generation of project-specific test cases.arXiv preprint arXiv:2507.20619(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis.CoRRabs/2009.10297 (2020). arXiv:2009.10297 https://arxiv.org/abs/2009.10297

work page internal anchor Pith review arXiv 2020
[26]

RSA ESA Rules [n. d.]. https://github.com/netwitness/nw-esa. RSA NetWitness ESA Rule Dataset
[27]

Rule SyntaxRule Syntax [n. d.]. https://community.netwitness.com/s/article/ RuleSyntax. NetWitness Community
[28]

2016.Artificial intelligence: a modern approach

Stuart J Russell and Peter Norvig. 2016.Artificial intelligence: a modern approach. Pearson

2016
[29]

SIEM Migration [n. d.]. https://cardinalops.com/blog/siem-migration-challenges- strategies/. SIEM Migration: Challenges and Strategies
[30]

Kirill Simonov and contributors. [n. d.]. PyYAML: YAML parser and emitter for Python. https://pypi.org/project/PyYAML/
[31]

Splunk [n. d.]. https://help.splunk.com/en/splunk-enterprise/search/spl-search- reference/9.4/introduction/welcome-to-the-search-reference. Splunk Enterprise
[32]

Splunk cisco company [n. d.]. https://shorturl.at/dgTsP. State of Security 2024: The Race to Harness AI

2024
[33]

Splunk SPL Rules [n. d.]. https://github.com/splunk/security_content/tree/ develop/detections. Splunk SPL Rule Dataset
[34]

Threat Report [n. d.]. https://www.cloudwards.net/cyber-security-statistics/ ?utm_source=chatgpt.com. 67 Cyber Security Statistics, Facts and Trends: Data on Attacks, Breaches and Threats for 2025

2025
[35]

Rafael Uetz, Marco Herzog, Louis Hackländer, Simon Schwarz, and Martin Henze
[36]

Sapienza, J

You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks.CoRRabs/2311.10197 (2023). arXiv:2311.10197 doi:10.48550/ARXIV. 2311.10197

work page internal anchor Pith review doi:10.48550/arxiv 2023
[37]

Mati Ur Rehman, Hadi Ahmadi, and Wajih Ul Hassan. 2024. Flash: A Compre- hensive Approach to Intrusion Detection via Provenance Graph Representa- tion Learning. In2024 IEEE Symposium on Security and Privacy (SP). 3552–3570. doi:10.1109/SP54263.2024.00139

work page doi:10.1109/sp54263.2024.00139 2024
[38]

VirusTotal. [n. d.]. yara-python: The Python interface for YARA. https://pypi. org/project/yara-python/
[39]

Hongtai Wang, Ming Xu, Yanpei Guo, Weili Han, Hoon Wei Lim, and Jin Song Dong. 2025. RulePilot: An LLM-Powered Agent for Security Rule Generation. CoRRabs/2511.12224 (2025). arXiv:2511.12224 doi:10.48550/ARXIV.2511.12224

work page doi:10.48550/arxiv.2511.12224 2025
[40]

Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I. Wang. 2025. SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Soft- ware Evolution. arXiv:2502.18449 [cs.SE] https://arxiv.org/abs/2502.18449

work page arXiv 2025
[41]

Why SIEM Migration [n. d.]. DemystifyingSIEMmigration: Pitfallstoavoidandtipsforensuringsuccess. Demystifying SIEM migration: Pitfalls to avoid and tips for ensuring success
[42]

Pengfei Wu, Jianting Ning, Wu Luo, Xinyi Huang, and Debiao He. 2021. Exploring dynamic task loading in SGX-based distributed computing.IEEE Transactions on Services Computing16, 1 (2021), 288–301

2021
[43]

You are a security analyst specializing in Microsoft Sentinel KQL

Wei Zhou, Guoliang Li, Haoyu Wang, Yuxing Han, Xufei Wu, Fan Wu, and Xuanhe Zhou. 2025. PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation. arXiv:2509.23338 [cs.DB] https://arxiv.org/abs/2509.23338 A Discussion Constraint Translation.Agentic AI offers a promising paradigm for constraint translation (e.g., translating the C programming...

work page arXiv 2025