pith. sign in

arxiv: 2607.01742 · v1 · pith:L6IJ4IIFnew · submitted 2026-07-02 · 💻 cs.CR · cs.SE

Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection

Pith reviewed 2026-07-03 11:05 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords smart contractsvulnerability detectionprocedural knowledgeknowledge evolutionexecutable policieslarge language modelsfew-shot evolution
0
0 comments X

The pith

Vulnerability detection in smart contracts can be achieved by evolving procedural knowledge from a handful of examples rather than updating model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that smart contract vulnerability detection can be done by automatically evolving step-by-step detection rules from very few examples. It uses a special runtime to enforce the rules strictly and refine them in phases that test synthetic cases then real ones. This would matter because current methods either need lots of manual work or lots of data which is hard to get for these bugs. If true, it means reliable detectors can be created and moved between models at low cost.

Core claim

EvoVuln reformulates vulnerability detection as a procedural knowledge evolution problem, synthesizing and refining detection logic using only a minimal number of labeled samples through an Inversion of Control runtime for executable policies and a two-phase evolution pipeline.

What carries the argument

The Inversion of Control runtime that compiles detection rules into Executable Policies to decouple deterministic control flow from LLM semantic reasoning and produce dense diagnostic telemetry.

If this is right

  • The evolved procedural knowledge enables a lightweight, low-cost model to surpass a much larger zero-shot model by 19 percentage points.
  • The knowledge transfers to other LLMs without retraining.
  • The one-time evolution cost is under $50.
  • Detection achieves a 71% macro-average F1-score across five real-world vulnerability types, outperforming baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could be extended to other security tasks involving logic bugs in code where labeled data is scarce.
  • The portability of the rules suggests they could be shared and improved collectively rather than recreated for each model.
  • Testing on evolving smart contract languages or new bug types would show if the method stays effective over time.

Load-bearing premise

The LLM can refine the detection rules abductively from only five vulnerable and five safe examples per type without missing important edge cases or introducing biases.

What would settle it

Measuring performance on a new collection of contracts with the same vulnerabilities but different code structures and seeing if the F1-score falls below the baselines.

Figures

Figures reproduced from arXiv: 2607.01742 by Han Liu, Yang Liu, Ying Li, Yiran Zhang, Yuqiang Sun, Ziyun Guo, Zong Cao.

Figure 1
Figure 1. Figure 1: A precision issue vulnerability analyzed with two [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of EVOVULN. 1 # Simplified Executable Policy for price ,→ manipulation. 2 # `runtime` is injected by the Runtime. 3 code = """A code snippet to be detected.""" 4 def main(): 5 # Focus on a code region (e.g., a function). 6 runtime.focus("{a function}") 7 # Pose semantic queries to the LLM. 8 has_balance_based_price = runtime.seen("Does the code calculate prices by third party balance?") ,→ ,→ 9… view at source ↗
Figure 3
Figure 3. Figure 3: An example of an Executable Policy. EP Generation & Execution. Given a detection rule and a target contract, the LLM generates an EP ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Smart contract vulnerabilities are predominantly logic bugs whose detection requires structured, step-by-step procedural knowledge of attack patterns and contract semantics. Existing LLM-based methods struggle to generate this knowledge automatically: prompt-based methods rely on manually crafted detection rules, while fine-tuning requires massive labeled datasets that are inherently scarce in this domain. We present EvoVuln, an automated framework that reformulates vulnerability detection as a procedural knowledge evolution problem, synthesizing and refining detection logic using only a minimal number of labeled samples. To achieve this, EvoVuln introduces two key mechanisms. First, a Runtime with an Inversion of Control (IoC) architecture compiles detection rules into Executable Policies. This strictly decouples deterministic control flow from LLM semantic reasoning, ensuring faithful logical adherence and producing dense diagnostic telemetry for precise error localization. Second, a two-phase evolution pipeline refines the rule via abductive semantic debugging without any parameter updates: Cold Start bootstraps and stress-tests an initial rule using auto-synthesized corner cases; Few-Shot Evolving then grounds the policy in real-world semantics using only five vulnerable and five safe examples per vulnerability type. Evaluated across five real-world vulnerability types, EvoVuln achieves a 71% macro-average F1-score, outperforming all baselines. The evolved procedural knowledge is portable across models: it enables a lightweight, low-cost model to surpass a much larger zero-shot model by 19 percentage points, and transfers to other LLMs without retraining, at a one-time evolution cost under $50.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents EvoVuln, a framework that reformulates smart contract vulnerability detection as procedural knowledge evolution. It introduces an IoC runtime to compile rules into Executable Policies that decouple control flow from LLM reasoning, and a two-phase pipeline (Cold Start with auto-synthesized corner cases, followed by Few-Shot Evolving using exactly five vulnerable and five safe examples per type) that performs abductive refinement without parameter updates. The central empirical claim is a 71% macro-average F1 across five real-world vulnerability types, outperforming baselines, with the evolved policies being portable across LLMs (enabling a lightweight model to exceed a larger zero-shot model by 19 points) at a one-time cost under $50.

Significance. If the evaluation details and generalizability hold, the work offers a promising direction for data-scarce security domains by showing that portable, executable detection logic can be synthesized from minimal examples rather than large labeled sets or fine-tuning. The IoC architecture for producing dense diagnostics and ensuring logical fidelity is a concrete technical contribution that could be adopted more broadly. The low-cost portability result, if reproducible, would be practically relevant for blockchain auditing tools.

major comments (2)
  1. [§4 (Evaluation)] §4 (Evaluation) and abstract: the 71% macro F1, 19-point portability gain, and 'outperforming all baselines' claims are stated without naming the concrete baselines, describing dataset construction or train/test splits, reporting statistical significance, or detailing how the macro F1 was aggregated across the five vulnerability types; these omissions make the central performance claims unverifiable from the supplied information.
  2. [§3.2 (Few-Shot Evolving)] §3.2 (Few-Shot Evolving): the abductive refinement step grounds policies using only five vulnerable and five safe examples per type and relies on the IoC runtime for error localization, yet no coverage metrics, edge-case enumeration, or verification of test-set disjointness from the evolution examples are reported; if the LLM refinements systematically miss semantic interactions (e.g., reentrancy with particular storage layouts), the reported F1 and cross-model portability would not hold.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'outperforming all baselines' should be accompanied by at least the names of the strongest baselines for immediate context.
  2. [§2] Notation: the term 'Executable Policies' is introduced without an explicit formal definition or pseudocode in the early sections; a short definition box would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on verifiability and robustness. We address each major point below and will revise the manuscript to improve clarity without altering the core claims or methodology.

read point-by-point responses
  1. Referee: [§4 (Evaluation)] §4 (Evaluation) and abstract: the 71% macro F1, 19-point portability gain, and 'outperforming all baselines' claims are stated without naming the concrete baselines, describing dataset construction or train/test splits, reporting statistical significance, or detailing how the macro F1 was aggregated across the five vulnerability types; these omissions make the central performance claims unverifiable from the supplied information.

    Authors: We agree that the abstract and §4 would benefit from greater explicitness to make the claims verifiable. In the revised version we will: (1) name the concrete baselines (zero-shot GPT-4, prior fine-tuned detectors, and static tools) in both the abstract and §4.2; (2) describe dataset construction, including sources, labeling process, and total contract counts; (3) specify the train/test splits and confirm they are disjoint; (4) report statistical significance via bootstrap resampling; and (5) state that macro F1 is the unweighted mean of the five per-type F1 scores. These additions will be placed in a new subsection of §4. revision: yes

  2. Referee: [§3.2 (Few-Shot Evolving)] §3.2 (Few-Shot Evolving): the abductive refinement step grounds policies using only five vulnerable and five safe examples per type and relies on the IoC runtime for error localization, yet no coverage metrics, edge-case enumeration, or verification of test-set disjointness from the evolution examples are reported; if the LLM refinements systematically miss semantic interactions (e.g., reentrancy with particular storage layouts), the reported F1 and cross-model portability would not hold.

    Authors: We acknowledge that explicit verification of disjointness and coverage would strengthen the section. In revision we will add: (a) confirmation that the five vulnerable and five safe examples per type are disjoint from the test set, (b) enumeration of the edge cases synthesized in the Cold Start phase together with coverage metrics, and (c) expanded discussion of how the IoC runtime's error localization addresses potential missed semantic interactions. We will also note this as a limitation in §5. The current empirical results and the runtime design provide supporting evidence, but we agree the additional reporting is warranted. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on external test-set comparisons

full rationale

The paper describes an empirical two-phase evolution pipeline (Cold Start + Few-Shot Evolving) that produces Executable Policies from 5+5 labeled examples per vulnerability type, then reports macro F1 and cross-model portability on held-out real-world contracts. No equations, fitted parameters, or self-citations are invoked to derive the 71% F1 or 19-point gains; performance is measured by direct comparison against baselines on external data. The IoC runtime and abductive refinements are procedural mechanisms whose outputs are validated externally rather than defined by construction from the input examples. This is a standard self-contained empirical claim with no load-bearing reduction to internal fits or author prior work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that LLMs can perform reliable abductive debugging when supplied with executable policies and error telemetry, plus the modeling choice that ten examples suffice to ground rules in real semantics.

free parameters (1)
  • examples per vulnerability type = 5 + 5
    The few-shot phase is defined to use exactly five vulnerable and five safe contracts; this count is a design choice that directly affects the refinement outcome.
axioms (1)
  • domain assumption LLMs supplied with executable policies and diagnostic telemetry can perform abductive semantic debugging that improves rule quality
    Invoked in the description of the Few-Shot Evolving phase.
invented entities (1)
  • Executable Policies no independent evidence
    purpose: Compile detection rules into a form that strictly separates deterministic control flow from LLM semantic reasoning
    New architectural component introduced to guarantee logical adherence and produce dense telemetry.

pith-pipeline@v0.9.1-grok · 5811 in / 1530 out tokens · 51216 ms · 2026-07-03T11:05:09.992476+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Defi hacks analysis - root cause analysis,

    “Defi hacks analysis - root cause analysis,” https://web3sec.notion.site/ c582b99cd7a84be48d972ca2126a2a1f, 2026

  2. [2]

    Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis,

    Y . Sun, D. Wu, Y . Xue, H. Liu, H. Wang, Z. Xu, X. Xie, and Y . Liu, “Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis,” inProceedings of the IEEE/ACM 46th international conference on software engineering, 2024, pp. 1–13

  3. [3]

    Propertygpt: Llm-driven formal verification of smart contracts through retrieval- augmented property generation,

    Y . Liu, Y . Xue, D. Wu, Y . Sun, Y . Li, M. Shi, and Y . Liu, “Propertygpt: Llm-driven formal verification of smart contracts through retrieval- augmented property generation,” inProceedings of the Network and Distributed System Security Symposium (NDSS), 2025

  4. [4]

    Smartguard: An llm- enhanced framework for smart contract vulnerability detection,

    H. Ding, Y . Liu, X. Piao, H. Song, and Z. Ji, “Smartguard: An llm- enhanced framework for smart contract vulnerability detection,”Expert Systems with Applications, vol. 269, p. 126479, 2025

  5. [5]

    Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications,

    W. Ma, D. Wu, Y . Sun, T. Wang, S. Liu, J. Zhang, Y . Xue, and Y . Liu, “Combining fine-tuning and llm-based agents for intuitive smart contract auditing with justifications,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 1742– 1754

  6. [6]

    Agent4vul: multimodal llm agents for smart contract vulnerability detection,

    W. Jie, W. Qiu, H. Yang, M. Guo, X. Huang, T. Lei, Q. Zhang, H. Zheng, and Z. Zheng, “Agent4vul: multimodal llm agents for smart contract vulnerability detection,”Science China Information Sciences, vol. 68, no. 6, p. 160101, 2025

  7. [7]

    Smartinv: Multimodal learning for smart contract invariant inference,

    S. J. Wang, K. Pei, and J. Yang, “Smartinv: Multimodal learning for smart contract invariant inference,” in2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 2217–2235

  8. [8]

    Re-task: Revisiting llm tasks from capability, skill, and knowledge perspectives,

    Z. Wang, S. Zhao, Y . Wang, H. Huang, S. Xie, Y . Zhang, J. Shi, Z. Wang, H. Li, and J. Yan, “Re-task: Revisiting llm tasks from capability, skill, and knowledge perspectives,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 4925–4936

  9. [9]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    Y . Jiang, D. Li, H. Deng, B. Ma, X. Wang, Q. Wang, and G. Yu, “Sok: Agentic skills–beyond tool use in llm agents,”arXiv preprint arXiv:2602.20867, 2026

  10. [10]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural informa- tion processing systems, vol. 36, pp. 68 539–68 551, 2023

  11. [11]

    Slither: a static analysis framework for smart contracts,

    J. Feist, G. Grieco, and A. Groce, “Slither: a static analysis framework for smart contracts,” inProceedings of the 2019 2nd IEEE/ACM in- ternational workshop on emerging trends in software engineering for blockchain (WETSEB). New York, NY , USA: IEEE, 2019, pp. 8–15

  12. [12]

    Mando-guru: vulnerability detection for smart contract source code by heterogeneous graph embeddings,

    H. H. Nguyen, N.-M. Nguyen, H.-P. Doan, Z. Ahmadi, T.-N. Doan, and L. Jiang, “Mando-guru: vulnerability detection for smart contract source code by heterogeneous graph embeddings,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1736–1740

  13. [13]

    Sael: Leveraging large language models with adaptive mixture-of-experts for smart contract vulnerability detection,

    L. Yu, S. Cheng, Z. Huang, J. Zhang, C. Shen, J. Lu, L. Yang, F. Zhang, and J. Ma, “Sael: Leveraging large language models with adaptive mixture-of-experts for smart contract vulnerability detection,” in2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2025, pp. 61–72

  14. [14]

    Gpt-5.2,

    “Gpt-5.2,” https://developers.openai.com/api/docs/models/gpt-5.2, 2026

  15. [15]

    Gpt-5 nano,

    “Gpt-5 nano,” https://developers.openai.com/api/docs/models/ gpt-5-nano, 2026

  16. [16]

    Claude code,

    Anthropic, “Claude code,” https://www.anthropic.com/claude-code, 2025

  17. [17]

    OpenAI, “Codex,” https://openai.com/codex/, 2025

  18. [18]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  19. [19]

    Faithful logical reasoning via symbolic chain-of-thought,

    J. Xu, H. Fei, L. Pan, Q. Liu, M.-L. Lee, and W. Hsu, “Faithful logical reasoning via symbolic chain-of-thought,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024

  20. [20]

    Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

    S. Boppana, A. Ma, M. Loeffler, R. Sarfati, E. Bigelow, A. Geiger, O. Lewis, and J. Merullo, “Reasoning theater: Disentangling model beliefs from chain-of-thought,” 2026. [Online]. Available: https: //arxiv.org/abs/2603.05488

  21. [21]

    Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,

    W. Chen, X. Ma, X. Wang, and W. W. Cohen, “Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,”Transactions on Machine Learning Research, 2023

  22. [22]

    Chain-of- thought prompting of large language models for discovering and fixing software vulnerabilities,

    Y . Nong, M. Aldeen, L. Cheng, H. Hu, F. Chen, and H. Cai, “Chain-of- thought prompting of large language models for discovering and fixing software vulnerabilities,”arXiv preprint arXiv:2402.17230, 2024

  23. [23]

    Preemptive answer “attacks

    R. Xu, Z. Qi, and W. Xu, “Preemptive answer “attacks” on chain-of- thought reasoning,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 14 708–14 726

  24. [24]

    Towards combining chain-of-thought and code static analysis for buffer overflow vulnerability detection,

    Y . Huang, Y . Chen, J. Cai, J. Wu, X. Chen, P. Shen, and L. Yun, “Towards combining chain-of-thought and code static analysis for buffer overflow vulnerability detection,”Software Quality Journal, vol. 34, no. 1, p. 2, 2026

  25. [25]

    Towards explainable vulnerability detection with large language models,

    Q. Mao, Z. Li, X. Hu, K. Liu, X. Xia, and J. Sun, “Towards explainable vulnerability detection with large language models,”IEEE Transactions on Software Engineering, 2025

  26. [26]

    Large language model for vulnerability detection: Emerging results and future directions,

    X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” inProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineer- ing: New Ideas and Emerging Results, 2024, pp. 47–51

  27. [27]

    5gpt: 5g vulnerability detection by combining zero-shot capabilities of gpt-4 with domain aware strategies through prompt engineering,

    A. Shahriar, S. J. Hisham, K. A. Rahman, M. R. Islam, M. S. Hossain, R.-H. Hwang, and Y .-D. Lin, “5gpt: 5g vulnerability detection by combining zero-shot capabilities of gpt-4 with domain aware strategies through prompt engineering,”IEEE Transactions on Information Foren- sics and Security, 2025

  28. [28]

    Examining zero-shot vulnerability repair with large language models,

    H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in2023 IEEE symposium on security and privacy (SP). IEEE, 2023, pp. 2339–2356

  29. [29]

    Hybrid language processor fuzzing via llm-based constraint solving,

    Y . Yang, S. Yao, J. Chen, and W. Lee, “Hybrid language processor fuzzing via llm-based constraint solving,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 6299–6318

  30. [30]

    Qwen3.5: Towards native multimodal agents,

    Qwen Team, “Qwen3.5: Towards native multimodal agents,” February

  31. [31]

    Available: https://qwen.ai/blog?id=qwen3.5

    [Online]. Available: https://qwen.ai/blog?id=qwen3.5

  32. [32]

    Minimax m2.5 sota in coding and agent, designed for agent universe,

    M. AI, “Minimax m2.5 sota in coding and agent, designed for agent universe,” https://www.minimax.io/models/text, 2026

  33. [33]

    The unified interface for llms,

    “The unified interface for llms,” https://openrouter.ai/, 2026

  34. [34]

    Mythril,

    “Mythril,” https://github.com/Consensys/mythril, 2026

  35. [35]

    Securify: Practical security analysis of smart contracts,

    P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Buenzli, and M. Vechev, “Securify: Practical security analysis of smart contracts,” inProceedings of the 2018 ACM SIGSAC conference on computer and communications security (CCS). New York, NY , USA: ACM, 2018, pp. 67–82

  36. [36]

    Osiris: Hunting for integer bugs in ethereum smart contracts,

    C. F. Torres, J. Sch ¨utte, and R. State, “Osiris: Hunting for integer bugs in ethereum smart contracts,” inProceedings of the 2018 34th annual computer security applications conference (ACSAC). New York, NY , USA: ACM, 2018, pp. 664–676

  37. [37]

    Manticore: A user-friendly symbolic execution framework for binaries and smart contracts,

    M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, J. Feist, T. Brunson, and A. Dinaburg, “Manticore: A user-friendly symbolic execution framework for binaries and smart contracts,” inProceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York, NY , USA: IEEE, 2019, pp. 1186–1189

  38. [38]

    Making smart contracts smarter,

    L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contracts smarter,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 254–269

  39. [39]

    Efficiently detecting reentrancy vulnerabilities in complex smart contracts,

    Z. Wang, J. Chen, Y . Wang, Y . Zhang, W. Zhang, and Z. Zheng, “Efficiently detecting reentrancy vulnerabilities in complex smart contracts,” vol. 1. [Online]. Available: https://doi.org/10.1145/3643734 11

  40. [40]

    Cross-contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts,

    Y . Xue, M. Ma, Y . Lin, Y . Sui, J. Ye, and T. Peng, “Cross-contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts,” inProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, ser. Ase ’20. Asso- ciation for Computing Machinery, pp. 1029–1040. [Online]. Available: https://doi-org...

  41. [41]

    Williams

    Z. Zheng, N. Zhang, J. Su, Z. Zhong, M. Ye, and J. Chen, “Turn the rudder: A beacon of reentrancy detection for smart contracts on ethereum,” inProceedings of the 45th International Conference on Software Engineering, ser. Icse ’23. IEEE Press, pp. 295–306. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00036

  42. [42]

    Clairvoyance: Cross- contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts,

    J. Ye, M. Ma, Y . Lin, Y . Sui, and Y . Xue, “Clairvoyance: Cross- contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts,” inProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, 2020, pp. 274–275

  43. [43]

    Reentrancy vulnerability detection and localization: A deep learning based two-phase approach,

    Z. Zhang, Y . Lei, M. Yan, Y . Yu, J. Chen, S. Wang, and X. Mao, “Reentrancy vulnerability detection and localization: A deep learning based two-phase approach,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–13

  44. [44]

    Reguard: finding reentrancy bugs in smart contracts,

    C. Liu, H. Liu, Z. Cao, Z. Chen, B. Chen, and B. Roscoe, “Reguard: finding reentrancy bugs in smart contracts,” inProceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, ser. ICSE ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 65–68. [Online]. Available: https://doi.org/10.1145/3183440.3183495

  45. [45]

    Silence false alarms: Identifying anti-reentrancy patterns on ethereum to refine smart contract reentrancy detection,

    Q. Song, H. Huang, X. Jia, Y . Xie, and J. Cao, “Silence false alarms: Identifying anti-reentrancy patterns on ethereum to refine smart contract reentrancy detection,” inNDSS, 2025

  46. [46]

    SAIL- FISH: Vetting smart contract state-inconsistency bugs in seconds,

    P. Bose, D. Das, Y . Chen, Y . Feng, C. Kruegel, and G. Vigna, “SAIL- FISH: Vetting smart contract state-inconsistency bugs in seconds,” in 2022 IEEE Symposium on Security and Privacy (SP), pp. 161–178

  47. [47]

    Achecker: Statically de- tecting smart contract access control vulnerabilities,

    A. Ghaleb, J. Rubin, and K. Pattabiraman, “Achecker: Statically de- tecting smart contract access control vulnerabilities,” inProceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, May 2023, p. 945–956

  48. [48]

    Beyond “protected

    Y . Fang, D. Wu, X. Yi, S. Wang, Y . Chen, M. Chen, Y . Liu, and L. Jiang, “Beyond “protected” and “private”: An empirical security analysis of custom function modifiers in smart contracts,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, New York, NY , USA, 2023, p. 1157–1168

  49. [49]

    Finding permission bugs in smart contracts with role mining,

    Y . Liu, Y . Li, S.-W. Lin, and C. Artho, “Finding permission bugs in smart contracts with role mining,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA

  50. [50]

    New York, NY , USA: Association for Computing Machinery, 2022, p. 716–727

  51. [51]

    Prettysmart: Detecting permission re-delegation vulnerability for token behaviors in smart contracts,

    Z. Zhong, Z. Zheng, H.-N. Dai, Q. Xue, J. Chen, and Y . Nan, “Prettysmart: Detecting permission re-delegation vulnerability for token behaviors in smart contracts,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, 2024

  52. [52]

    Actaint: Agent- based taint analysis for access control vulnerabilities in smart contracts,

    H. Lin, Z. Gao, J. Chen, X. Chen, X. Yang, and L. Bao, “Actaint: Agent- based taint analysis for access control vulnerabilities in smart contracts,” inProceedings of the 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York, NY , USA: IEEE, 2025, pp. 2555–2567

  53. [53]

    Have We Solved Access Control Vulnerability Detection in Smart Contracts? A Benchmark Study,

    H. Liu, D. Wu, Y . Sun, S. Wang, Y . Liu, and Y . Chen, “Have We Solved Access Control Vulnerability Detection in Smart Contracts? A Benchmark Study,” inProceedings of the 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York, NY , USA: IEEE, 2025, pp. 1995–2007

  54. [54]

    Defitainter: Detect- ing price manipulation vulnerabilities in defi protocols,

    Q. Kong, J. Chen, Y . Wang, Z. Jiang, and Z. Zheng, “Defitainter: Detect- ing price manipulation vulnerabilities in defi protocols,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 1144–1156

  55. [55]

    De- firanger: Detecting defi price manipulation attacks,

    S. Wu, Z. Yu, D. Wang, Y . Zhou, L. Wu, H. Wang, and X. Yuan, “De- firanger: Detecting defi price manipulation attacks,”IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 4, pp. 4147–4161, 2023

  56. [56]

    Toward automated detecting unanticipated price feed in smart contract,

    Y . Mo, J. Chen, Y . Wang, and Z. Zheng, “Toward automated detecting unanticipated price feed in smart contract,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 1257–1268

  57. [57]

    Cobra: interaction-aware bytecode- level vulnerability detector for smart contracts,

    W. Li, X. Li, Z. Li, and Y . Zhang, “Cobra: interaction-aware bytecode- level vulnerability detector for smart contracts,” inProceedings of the 39th IEEE/ACM international conference on automated software engineering, 2024, pp. 1358–1369

  58. [58]

    A novel method for vulnerability detection based on fusion and hyperbolic neural network graphs,

    X. Li, A. Bhattacharjya, Q. Li, M. Zhou, R. Wisniewski, and G. Bazydło, “A novel method for vulnerability detection based on fusion and hyperbolic neural network graphs,”IEEE Transactions on Software Engineering, 2026

  59. [59]

    When chatgpt meets smart contract vulnerability detec- tion: How far are we?

    C. Chen, J. Su, J. Chen, Y . Wang, T. Bi, J. Yu, Y . Wang, X. Lin, T. Chen, and Z. Zheng, “When chatgpt meets smart contract vulnerability detec- tion: How far are we?”ACM Transactions on Software Engineering and Methodology, vol. 34, no. 4, pp. 1–30, 2025

  60. [60]

    Do you still need a manual smart contract audit?

    I. David, L. Zhou, K. Qin, D. Song, L. Cavallaro, and A. Gervais, “Do you still need a manual smart contract audit?”arXiv preprint arXiv:2306.12338, 2023

  61. [61]

    Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,

    L. Yu, Z. Huang, H. Yuan, S. Cheng, L. Yang, F. Zhang, C. Shen, J. Ma, J. Zhang, J. Luet al., “Smart-llama-dpo: Reinforced large language model for explainable smart contract vulnerability detection,” Proceedings of the ACM on Software Engineering, vol. 2, no. ISSTA, pp. 182–205, 2025

  62. [62]

    Numscout: Unveiling numerical defects in smart contracts using llm-pruning symbolic execution,

    J. Chen, Z. Shao, S. Yang, Y . Shen, Y . Wang, T. Chen, Z. Shan, and Z. Zheng, “Numscout: Unveiling numerical defects in smart contracts using llm-pruning symbolic execution,”IEEE Transactions on Software Engineering, 2025

  63. [63]

    Self-refine: Iter- ative refinement with self-feedback,

    A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Yanget al., “Self-refine: Iter- ative refinement with self-feedback,”Advances in neural information processing systems, vol. 36, pp. 46 534–46 594, 2023

  64. [64]

    Re- flexion: Language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Re- flexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023. 12