On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus

Feifei Niu; Xiaoting Zhang; Xing Hu; Xin Xia; Yiran Lv; Zhipeng Gao

arxiv: 2606.07363 · v1 · pith:3FN36EVNnew · submitted 2026-06-05 · 💻 cs.CR · cs.SE

On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus

Xiaoting Zhang , Zhipeng Gao , Yiran Lv , Xing Hu , Feifei Niu , Xin Xia This is my paper

Pith reviewed 2026-06-27 21:48 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords smart contract securityvulnerability detectionautomated dataset creationLLM benchmarkingblockchain auditingquality assurancedivide-and-conquer extraction

0 comments

The pith

GiANT automates extraction of 7,711 smart contract vulnerabilities from 388 real-world audit reports using LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GiANT as an automated way to build smart contract auditing datasets by distilling information from existing human-written reports. It uses a divide-and-conquer approach with chain-of-thought reasoning to structure the vulnerability data and an LLM judge to ensure quality. This method aims to overcome the limits of manual dataset creation in terms of scale and detail. Evaluation on 388 reports produces a corpus of 7,711 findings that scores highly in manual checks for accuracy and consistency. The dataset then supports testing LLMs on several auditing-related tasks to set performance standards.

Core claim

The central discovery is that running the GiANT framework on 388 real-world audit reports yields the GiAnt Corpus with 7,711 vulnerability findings at five severity levels. Extraction relies on divide-and-conquer plus chain-of-thought to pull structured details, and LLM-as-a-judge ensures quality. Human reviewers rate the output at a mean of 4.76 out of 5 with strong agreement. Benchmarks on the corpus give baseline results for LLMs performing vulnerability detection, code summarization, mitigation recommendation, and gas optimization.

What carries the argument

Divide-and-conquer strategy with Chain-of-Thought for structured extraction from reports, paired with an LLM-as-a-judge quality control step.

If this is right

The GiAnt Corpus allows benchmarking of LLMs on vulnerability detection and related auditing tasks.
It establishes performance baselines for state-of-the-art models on those tasks.
The approach offers a scalable alternative to manual dataset curation for smart contract auditing.
Structured data across severity levels facilitates diverse research applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the same extraction process to reports from other auditing sources could expand available data for security research.
High-quality extracted datasets might accelerate development of AI systems that assist or replace parts of human audits.
Patterns identified across the large corpus could inform better vulnerability prevention strategies in contract development.

Load-bearing premise

The LLM extraction and quality control process accurately and completely captures the vulnerabilities described in the original human audit reports without systematic errors or omissions.

What would settle it

A side-by-side manual audit of extracted entries against their source reports that finds consistent missing vulnerabilities or misrepresented details.

Figures

Figures reproduced from arXiv: 2606.07363 by Feifei Niu, Xiaoting Zhang, Xing Hu, Xin Xia, Yiran Lv, Zhipeng Gao.

**Figure 1.** Figure 1: Overview of GiAnt framework. and PDF files. Moreover, we clone all corresponding Smart Contract Code Repos that fall within the scope of each audit. Since our construction requires both vulnerability information and the corresponding source code, we filter out audit reports where the source code is not publicly accessible. The remaining reports and their corresponding repositories constitute our Raw Data c… view at source ↗

**Figure 2.** Figure 2: An example record of our dataset in JSON format. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Confusion matrices of LLMs for multi-class vulnerability prediction. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

High-quality smart contract auditing datasets are crucial for evaluating security tools and advancing smart contract security research. Two major limitations of existing datasets are the manual-induced scalability bottleneck and the deficiency in data granularity and diversity. To address these limitations, we propose GiANT, an automated framework designed to curate smart contract auditing datasets by distilling vulnerability insights from real-world auditing reports. GiANT employs a divide-and-conquer strategy coupled with the Chain-of-Thought technique to extract structured vulnerability information from Code4rena reports, followed by an LLM-as-a-judge mechanism to perform rigorous quality assurance. To evaluate GiANT's effectiveness, we run it on 388 real-world audit reports and generate the GiAnt Corpus comprising 7,711 vulnerability findings across five severity levels. Manual assessment of the dataset demonstrates exceptional reliability in information extraction, achieving a mean quality score of $4.76\pm0.37$ (out of 5) with inter-rater agreement $\kappa$ of 0.88. We further validate the practicality of our dataset by benchmarking 4 state-of-the-art LLMs on vulnerability detection, code summarization, mitigation recommendation, and automated gas optimization tasks, to establish performance baselines, thereby providing a valuable data foundation for future research in automated smart contract auditing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GiANT builds a new 7,711-item vulnerability corpus from Code4rena reports via LLM pipeline, but the human checks only rate what was kept and skip testing whether everything in the originals was captured.

read the letter

The main takeaway is a new dataset of 7,711 structured vulnerability findings pulled from 388 real audit reports. The method combines divide-and-conquer extraction with chain-of-thought and an LLM judge for quality filtering, then reports a mean manual quality score of 4.76 out of 5 and kappa of 0.88 on sampled items.

The work scales up data creation in a domain that has been bottlenecked by manual effort. Prior datasets were smaller and less granular, so producing something this size automatically is a practical step. The benchmarks on four LLMs for detection, summarization, mitigation, and gas optimization give concrete starting points for others to compare against.

The soft spot is the missing recall check. The manual assessment looks only at the quality and readability of the extracted findings. It does not compare back to the source reports to confirm that every mentioned vulnerability was pulled out or that none were altered or dropped. That gap matters for anyone who wants to treat the corpus as a complete benchmark. The abstract also gives little detail on how the manual sample was chosen or what prompts were used, which leaves room for unexamined bias.

This is aimed at people building or testing automated auditing tools for smart contracts. It supplies more data than was available before, even if the extraction fidelity needs more scrutiny.

I would send it for peer review. The corpus itself is new and the reported numbers are specific, so referees can evaluate the extraction process and suggest the missing completeness tests.

Referee Report

3 major / 2 minor

Summary. The paper introduces the GiANT framework, which applies an LLM-based divide-and-conquer strategy with Chain-of-Thought prompting to extract structured vulnerability findings from 388 Code4rena audit reports, followed by an LLM-as-a-judge quality filter. This produces the GiAnt Corpus containing 7,711 vulnerability findings across five severity levels. The authors report a mean manual quality score of 4.76±0.37 (out of 5) with inter-rater agreement κ=0.88 on sampled items and demonstrate utility by benchmarking four state-of-the-art LLMs on vulnerability detection, code summarization, mitigation recommendation, and gas optimization tasks.

Significance. If the extraction process can be shown to be both high-precision and complete, the resulting corpus would address key scalability and granularity limitations in existing smart-contract auditing datasets and provide a reusable foundation for training and evaluating automated auditing tools. The reported benchmarking baselines would then serve as a concrete reference point for future LLM-based auditing research.

major comments (3)

[Evaluation section] Evaluation section: the manual assessment reports quality scores and κ=0.88 on sampled extracted findings but supplies no information on the sampling procedure for the 388 reports or the 7,711 items, the exact extraction and judging prompts, exclusion criteria, or controls for LLM-judge bias. These omissions directly affect the strength of the claim that the corpus is a reliable distillation of the source reports.
[Corpus construction] Corpus construction (GiANT pipeline description): the method is evaluated only on precision-oriented metrics (quality of retained items). No recall or completeness audit is performed against the original human-written audit reports to verify that every mentioned vulnerability was extracted without systematic omission or distortion. This gap is load-bearing for the central claim that the 7,711 findings constitute a faithful, high-granularity corpus suitable for benchmarking.
[Benchmarking experiments] Benchmarking experiments: the performance baselines for the four LLMs are presented without discussion of how potential under-extraction in the corpus could bias the measured task accuracies or limit the generalizability of the reported results.

minor comments (2)

[Abstract and Results] The abstract states that findings span five severity levels; the main text should explicitly map these levels to the original Code4rena severity categories and report the distribution across the 7,711 items.
[Figures and Tables] Figure captions and table headers should include the exact number of reports and findings used in each evaluation step to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. Below we provide point-by-point responses to the major comments, indicating the revisions we will make to address them.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: the manual assessment reports quality scores and κ=0.88 on sampled extracted findings but supplies no information on the sampling procedure for the 388 reports or the 7,711 items, the exact extraction and judging prompts, exclusion criteria, or controls for LLM-judge bias. These omissions directly affect the strength of the claim that the corpus is a reliable distillation of the source reports.

Authors: We agree these details are necessary for reproducibility and to support the reliability claim. In the revised manuscript we will expand the Evaluation section to specify the sampling procedure (random selection of 30 reports and 150 findings), include the exact extraction and judging prompts in an appendix, state the exclusion criteria applied during sampling, and add a comparison of LLM-judge outputs against human annotations on a 50-item subset to address potential judge bias. revision: yes
Referee: [Corpus construction] Corpus construction (GiANT pipeline description): the method is evaluated only on precision-oriented metrics (quality of retained items). No recall or completeness audit is performed against the original human-written audit reports to verify that every mentioned vulnerability was extracted without systematic omission or distortion. This gap is load-bearing for the central claim that the 7,711 findings constitute a faithful, high-granularity corpus suitable for benchmarking.

Authors: We acknowledge that the absence of a recall evaluation limits the strength of the completeness claim. Our validation emphasized precision of retained items. We will add a dedicated Limitations section that explicitly discusses the lack of a full recall audit, notes the resource constraints that prevented it, and clarifies that the corpus is presented as a high-precision extraction rather than a guaranteed exhaustive one. revision: yes
Referee: [Benchmarking experiments] Benchmarking experiments: the performance baselines for the four LLMs are presented without discussion of how potential under-extraction in the corpus could bias the measured task accuracies or limit the generalizability of the reported results.

Authors: We agree that potential bias from incomplete extraction should be addressed. In the revised Benchmarking section we will add a paragraph noting that the high manual quality scores and inter-rater agreement provide indirect support for limited under-extraction, that any missed items would likely affect all models similarly (preserving relative rankings), and that the reported accuracies should be interpreted as potentially conservative lower bounds. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical corpus curation pipeline

full rationale

The paper describes an LLM-based divide-and-conquer extraction pipeline applied to 388 Code4rena reports, producing 7,711 findings whose quality is measured by independent human raters (mean score 4.76±0.37, κ=0.88). No equations, parameter fitting, predictions derived from fitted inputs, or self-citation chains appear in the provided text. The central claim rests on external human validation rather than any quantity defined from the extraction process itself; therefore the derivation chain is self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that current LLMs can perform reliable structured extraction and quality judgment from technical audit text; no free parameters or invented entities are introduced.

axioms (2)

domain assumption LLM divide-and-conquer plus Chain-of-Thought prompting can extract structured vulnerability information from Code4rena reports at scale
This premise is invoked to justify running the pipeline on 388 reports and claiming the resulting corpus is high-quality.
domain assumption An LLM-as-a-judge mechanism provides rigorous quality assurance comparable to human review
The abstract relies on this to validate the 4.76 mean score without detailing human-LLM agreement beyond the reported κ.

pith-pipeline@v0.9.1-grok · 5771 in / 1447 out tokens · 19023 ms · 2026-06-27T21:48:58.204811+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 5 canonical work pages

[1]

https://github.com/pymupdf/PyMuPDF

Pymupdf. https://github.com/pymupdf/PyMuPDF
[2]

https://zenodo.org/records/19325553 (2026)

Our replication package. https://zenodo.org/records/19325553 (2026)

arXiv 2026
[3]

arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023
[4]

arXiv preprint arXiv:2309.16609 (2023)

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)

Pith/arXiv arXiv 2023
[5]

Automated Software Engineering31(2), 63 (2024)

Chen, J., Hu, J., Xia, X., Lo, D., Grundy, J., Gao, Z., Chen, T.: Angels or demons: investigating and detecting decentralized financial traps on ethereum smart con- tracts. Automated Software Engineering31(2), 63 (2024)

2024
[6]

IEEE Transactions on Software Engineering (2025)

Chen, J., Shao, Z., Yang, S., Shen, Y., Wang, Y., Chen, T., Shan, Z., Zheng, Z.: Numscout: Unveiling numerical defects in smart contracts using llm-pruning symbolic execution. IEEE Transactions on Software Engineering (2025)

2025
[7]

Chen, J., Shen, Y., Zhang, J., Li, Z., Grundy, J., Shao, Z., Wang, Y., Wang, J., Chen, T., Zheng, Z.: Forge: An llm-driven framework for large-scale smart contract vulnerability dataset construction (2025), https://arxiv.org/abs/2506.18795

arXiv 2025
[8]

Code4rena: https://code4rena.com/
[9]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Dai, Z., Chen, B., Zhao, Z., Tang, X., Wu, S., Yao, C., Gao, Z., Chen, J.: Less is more: Adaptive program repair with bug localization and preference learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 128–136 (2025)

2025
[10]

In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dai, Z., Yao, C., Han, W., Yuanying, Y., Gao, Z., Chen, J.: Mpcoder: Multi-user personalized code generator with explicit and implicit style representation learning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 3765–3780 (2024)

2024
[11]

arXiv preprint arXiv:2601.08545 (2026)

Dai, Z., Zhao, Z., Wang, H., Tang, X., Wu, S., Yao, C., Gao, Z., Chen, J.: Learner- tailored program repair: A solution generator with iterative edit-driven retrieval enhancement. arXiv preprint arXiv:2601.08545 (2026)

arXiv 2026
[12]

DeFiLlama: https://defillama.com/ (2026)

2026
[13]

Ding, Y., Fu, Y., Ibrahim, O., Sitawarin, C., Chen, X., Alomair, B., Wagner, D., Ray, B., Chen, Y.: Vulnerability detection with code language models: How far are we? arXiv preprint arXiv:2403.18624 (2024)

arXiv 2024
[14]

Empirical review of automated analysis tools on 47,587 ethereum smart contracts,

Durieux, T., Ferreira, J.F., Abreu, R., Cruz, P.: Empirical review of auto- mated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ICSE ’20. https://doi.org/10.1145/3377811.3380364

work page doi:10.1145/3377811.3380364
[15]

In: 2019 IEEE/ACM 2nd international workshop on emerging trends in software engineering for blockchain (WETSEB)

Feist, J., Grieco, G., Groce, A.: Slither: a static analysis framework for smart contracts. In: 2019 IEEE/ACM 2nd international workshop on emerging trends in software engineering for blockchain (WETSEB). pp. 8–15. IEEE (2019) 20 X. Zhang et al

2019
[16]

In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering

Gao, Z.: When deep learning meets smart contracts. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp. 1400– 1402 (2020)

2020
[17]

In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Gao, Z., Jayasundara, V., Jiang, L., Xia, X., Lo, D., Grundy, J.: Smartembed: A tool for clone and bug detection in smart contracts through structural code embedding. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 394–397. IEEE (2019)

2019
[18]

IEEE Transactions on Software Engineering47(12), 2874–2891 (2020)

Gao, Z., Jiang, L., Xia, X., Lo, D., Grundy, J.: Checking smart contracts with structural code embedding. IEEE Transactions on Software Engineering47(12), 2874–2891 (2020)

2020
[19]

Hedera Hashgraph, LLC: What is a smart contract audit? https://hedera.com/learning/smart-contract-audit/
[20]

In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Hu, X., Gao, Z., Xia, X., Lo, D., Yang, X.: Automating user notice generation for smart contract functions. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 5–17. IEEE (2021)

2021
[21]

Survey of hallucination in natural language generation,

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Computing Surveys 55(12) (2023). https://doi.org/10.1145/3571730

work page doi:10.1145/3571730 2023
[22]

ACM Trans

Li, X., Li, Z., Li, W., Zhang, Y., Wang, X.: No more hidden pitfalls? exposing smart contract bad practices with llm-powered hybrid analysis. ACM Trans. Softw. Eng. Methodol. (2026). https://doi.org/10.1145/3795692, just Accepted

work page doi:10.1145/3795692 2026
[23]

In: 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Lin, H., Gao, Z., Chen, J., Chen, X., Yang, X., Bao, L.: Actaint: Agent-based taint analysis for access control vulnerabilities in smart contracts. In: 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 2555–2567. IEEE (2025)

2025
[24]

arXiv:2412.19437 (2024)

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv:2412.19437 (2024)

Pith/arXiv arXiv 2024
[25]

In: 33rd USENIX Security Symposium (USENIX Security 24)

Liu, P., Liu, J., Fu, L., Lu, K., Xia, Y., Zhang, X., Chen, W., Weng, H., Ji, S., Wang, W.: Exploring{ChatGPT’s} capabilities on vulnerability management. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 811–828 (2024)

2024
[26]

In: Findings of the Association for Computational Linguistics: ACL 2023

Liu, X., Tan, Y., Xiao, Z., Zhuge, J., Zhou, R.: Not the end of story: An eval- uation of chatgpt-driven vulnerability description mappings. In: Findings of the Association for Computational Linguistics: ACL 2023. pp. 3724–3731 (2023)

2023
[27]

arXiv:2102.04664 (2021)

Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., et al.: Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 (2021)

Pith/arXiv arXiv 2021
[28]

In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security

Luu, L., Chu, D.H., Olickel, H., Saxena, P., Hobor, A.: Making smart contracts smarter. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. pp. 254–269 (2016)

2016
[29]

Proceedings of the ACM on Software Engineering1(FSE), 2355–2377 (2024)

Mai, Y., Gao, Z., Hu, X., Bao, L., Liu, Y., Sun, J.: Are human rules necessary? generating reusable apis with cot reasoning and in-context learning. Proceedings of the ACM on Software Engineering1(FSE), 2355–2377 (2024)

2024
[30]

In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

Mai,Y.,Gao,Z.,Wang,H.,Bi,T.,Hu,X.,Xia,X.,Sun,J.:Towardsbetteranswers: Automated stack overflow post updating. In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). pp. 591–603. IEEE (2025)

2025
[31]

https://owasp.org/www-project-smart-contract-top-10/

OWASP Foundation: OWASP smart contract top 10 - 2026. https://owasp.org/www-project-smart-contract-top-10/

2026
[32]

IEEE Transactions on Software Engineering (2026) Empowering Automated Smart Contract Auditing via the GiAnt Corpus 21

Ruan, Y., Gao, Z., Chen, J., Bao, L., Yang, X.: Improving gas efficiency in smart contracts: Data-driven insights and llm-assisted remediation. IEEE Transactions on Software Engineering (2026) Empowering Automated Smart Contract Auditing via the GiAnt Corpus 21

2026
[33]

In: 2024 IEEE Symposium on Security and Privacy (SP)

Sendner, C., Petzi, L., Stang, J., Dmitrienko, A.: Large-scale study of vulnerability scanners for ethereum smart contracts. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2273–2290. IEEE (2024)

2024
[34]

https://solidityscan.com/web3hackhub?year=2025, accessed: 2026-03-02

SolidityScan: Web3HackHub: 2025 Web3 security incidents statistics. https://solidityscan.com/web3hackhub?year=2025, accessed: 2026-03-02

2025
[35]

SolidityScan: Web3HackHub 2024 annual security report: Analyzing 149 incidents and $1.42b losses. Tech. rep., SolidityScan (2024), https://solidityscan.com/

2024
[36]

arXiv:2401.16185 (2024)

Sun, Y., Wu, D., Xue, Y., Liu, H., Ma, W., Zhang, L., Liu, Y., Li, Y.: Llm4vuln: A unified evaluation framework for decoupling and enhancing llms’ vulnerability reasoning. arXiv:2401.16185 (2024)

arXiv 2024
[37]

In: Proceedings of the IEEE/ACM 46th international conference on soft- ware engineering

Sun, Y., Wu, D., Xue, Y., Liu, H., Wang, H., Xu, Z., Xie, X., Liu, Y.: Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In: Proceedings of the IEEE/ACM 46th international conference on soft- ware engineering. pp. 1–13 (2024)

2024
[38]

In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (2018)

Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., Vechev, M.: Securify: Practical security analysis of smart contracts. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (2018)

2018
[39]

IEEE Transactions on Software Engineering50(11), 2732–2752 (2024)

Wang, H., Gao, Z., Hu, X., Lo, D., Grundy, J., Wang, X.: Just-in-time todo-missed commits detection. IEEE Transactions on Software Engineering50(11), 2732–2752 (2024)

2024
[40]

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A., et al.: Experimentation in software engineering, vol. 236. Springer (2012)

2012
[41]

Xia, B., Bi, T., Xing, Z., Lu, Q., Zhu, L.: An empirical study on software bill of materials: Where we stand and the road ahead (2023), https://arxiv.org/abs/2301.05362

arXiv 2023
[42]

In: 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)

Xia, S., He, M., Song, L., Zhang, Y.: Sc-bench: A large-scale dataset for smart contract auditing. In: 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code). pp. 57–64 (2025). https://doi.org/10.1109/LLM4Code66737.2025.00012

work page doi:10.1109/llm4code66737.2025.00012 2025
[43]

ACM Transactions on Software Engineering and Methodology34(3), 1–31 (2025)

Xiang, J., Gao, Z., Bao, L., Hu, X., Chen, J., Xia, X.: Automating comment genera- tion for smart contract from bytecode. ACM Transactions on Software Engineering and Methodology34(3), 1–31 (2025)

2025
[44]

Xue, Z., Gao, Z., Wang, S., Hu, X., Xia, X., Li, S.: Selfpico: Self-guided partial codeexecutionwithllms.In:Proceedingsofthe33rdACMSIGSOFTInternational Symposium on Software Testing and Analysis. pp. 1389–1401 (2024)

2024
[45]

ACM Transactions on Software Engineering and Methodology (2025)

Xue, Z., Zhang, X., Gao, Z., Hu, X., Gao, S., Xia, X., Li, S.: Clean code, better models: Enhancing llm performance with smell-cleaned dataset. ACM Transactions on Software Engineering and Methodology (2025)

2025
[46]

In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Yan, D., Gao, Z., Liu, Z.: A closer look at different difficulty levels code gener- ation abilities of chatgpt. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 1887–1898. IEEE (2023)

2023
[47]

arXiv:2202.11409 (2022)

Yashavant,C.S.,Kumar,S.,Karkare,A.:Scrawld:Adatasetofrealworldethereum smart contracts labelled with vulnerabilities. arXiv:2202.11409 (2022)

arXiv 2022
[48]

ACM Transactions on Software Engineering and Methodology (2025)

Yu, J., Gao, Z., Bao, L., Liu, Z.: Enhancing domain-specific code completion via collaborative inference with large and small language models. ACM Transactions on Software Engineering and Methodology (2025)

2025
[49]

In: 2020 IEEE international conference on software maintenance and evolution (ICSME)

Zhang, P., Xiao, F., Luo, X.: A framework and dataset for bugs in ethereum smart contracts. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). pp. 139–150. IEEE (2020)

2020
[50]

arXiv preprint arXiv:1904.09675 (2019) 22 X

Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019) 22 X. Zhang et al

Pith/arXiv arXiv 1904
[51]

In: 2023 IEEE/ACM 45th International Conference on Software Engi- neering (ICSE)

Zhang, Z., Zhang, B., Xu, W., Lin, Z.: Demystifying exploitable bugs in smart contracts. In: 2023 IEEE/ACM 45th International Conference on Software Engi- neering (ICSE). pp. 615–627. IEEE (2023)

2023
[52]

IEEE Transactions on Software Engineering50(6) (2024)

Zheng, Z., Su, J., Chen, J., Lo, D., Zhong, Z., Ye, M.: Dappscan: Building large- scale datasets for smart contract weaknesses in dapp projects. IEEE Transactions on Software Engineering50(6) (2024). https://doi.org/10.1109/tse.2024.3383422

work page doi:10.1109/tse.2024.3383422 2024
[53]

Future Gen- eration Computer Systems105, 475–491 (2020)

Zheng, Z., Xie, S., Dai, H.N., Chen, W., Chen, X., Weng, J., Imran, M.: An overview on smart contracts: Challenges, advances and platforms. Future Gen- eration Computer Systems105, 475–491 (2020)

2020

[1] [1]

https://github.com/pymupdf/PyMuPDF

Pymupdf. https://github.com/pymupdf/PyMuPDF

[2] [2]

https://zenodo.org/records/19325553 (2026)

Our replication package. https://zenodo.org/records/19325553 (2026)

arXiv 2026

[3] [3]

arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023

[4] [4]

arXiv preprint arXiv:2309.16609 (2023)

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)

Pith/arXiv arXiv 2023

[5] [5]

Automated Software Engineering31(2), 63 (2024)

Chen, J., Hu, J., Xia, X., Lo, D., Grundy, J., Gao, Z., Chen, T.: Angels or demons: investigating and detecting decentralized financial traps on ethereum smart con- tracts. Automated Software Engineering31(2), 63 (2024)

2024

[6] [6]

IEEE Transactions on Software Engineering (2025)

Chen, J., Shao, Z., Yang, S., Shen, Y., Wang, Y., Chen, T., Shan, Z., Zheng, Z.: Numscout: Unveiling numerical defects in smart contracts using llm-pruning symbolic execution. IEEE Transactions on Software Engineering (2025)

2025

[7] [7]

Chen, J., Shen, Y., Zhang, J., Li, Z., Grundy, J., Shao, Z., Wang, Y., Wang, J., Chen, T., Zheng, Z.: Forge: An llm-driven framework for large-scale smart contract vulnerability dataset construction (2025), https://arxiv.org/abs/2506.18795

arXiv 2025

[8] [8]

Code4rena: https://code4rena.com/

[9] [9]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Dai, Z., Chen, B., Zhao, Z., Tang, X., Wu, S., Yao, C., Gao, Z., Chen, J.: Less is more: Adaptive program repair with bug localization and preference learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 128–136 (2025)

2025

[10] [10]

In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dai, Z., Yao, C., Han, W., Yuanying, Y., Gao, Z., Chen, J.: Mpcoder: Multi-user personalized code generator with explicit and implicit style representation learning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 3765–3780 (2024)

2024

[11] [11]

arXiv preprint arXiv:2601.08545 (2026)

Dai, Z., Zhao, Z., Wang, H., Tang, X., Wu, S., Yao, C., Gao, Z., Chen, J.: Learner- tailored program repair: A solution generator with iterative edit-driven retrieval enhancement. arXiv preprint arXiv:2601.08545 (2026)

arXiv 2026

[12] [12]

DeFiLlama: https://defillama.com/ (2026)

2026

[13] [13]

Ding, Y., Fu, Y., Ibrahim, O., Sitawarin, C., Chen, X., Alomair, B., Wagner, D., Ray, B., Chen, Y.: Vulnerability detection with code language models: How far are we? arXiv preprint arXiv:2403.18624 (2024)

arXiv 2024

[14] [14]

Empirical review of automated analysis tools on 47,587 ethereum smart contracts,

Durieux, T., Ferreira, J.F., Abreu, R., Cruz, P.: Empirical review of auto- mated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ICSE ’20. https://doi.org/10.1145/3377811.3380364

work page doi:10.1145/3377811.3380364

[15] [15]

In: 2019 IEEE/ACM 2nd international workshop on emerging trends in software engineering for blockchain (WETSEB)

Feist, J., Grieco, G., Groce, A.: Slither: a static analysis framework for smart contracts. In: 2019 IEEE/ACM 2nd international workshop on emerging trends in software engineering for blockchain (WETSEB). pp. 8–15. IEEE (2019) 20 X. Zhang et al

2019

[16] [16]

In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering

Gao, Z.: When deep learning meets smart contracts. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp. 1400– 1402 (2020)

2020

[17] [17]

In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Gao, Z., Jayasundara, V., Jiang, L., Xia, X., Lo, D., Grundy, J.: Smartembed: A tool for clone and bug detection in smart contracts through structural code embedding. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 394–397. IEEE (2019)

2019

[18] [18]

IEEE Transactions on Software Engineering47(12), 2874–2891 (2020)

Gao, Z., Jiang, L., Xia, X., Lo, D., Grundy, J.: Checking smart contracts with structural code embedding. IEEE Transactions on Software Engineering47(12), 2874–2891 (2020)

2020

[19] [19]

Hedera Hashgraph, LLC: What is a smart contract audit? https://hedera.com/learning/smart-contract-audit/

[20] [20]

In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Hu, X., Gao, Z., Xia, X., Lo, D., Yang, X.: Automating user notice generation for smart contract functions. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 5–17. IEEE (2021)

2021

[21] [21]

Survey of hallucination in natural language generation,

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Computing Surveys 55(12) (2023). https://doi.org/10.1145/3571730

work page doi:10.1145/3571730 2023

[22] [22]

ACM Trans

Li, X., Li, Z., Li, W., Zhang, Y., Wang, X.: No more hidden pitfalls? exposing smart contract bad practices with llm-powered hybrid analysis. ACM Trans. Softw. Eng. Methodol. (2026). https://doi.org/10.1145/3795692, just Accepted

work page doi:10.1145/3795692 2026

[23] [23]

In: 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Lin, H., Gao, Z., Chen, J., Chen, X., Yang, X., Bao, L.: Actaint: Agent-based taint analysis for access control vulnerabilities in smart contracts. In: 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 2555–2567. IEEE (2025)

2025

[24] [24]

arXiv:2412.19437 (2024)

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv:2412.19437 (2024)

Pith/arXiv arXiv 2024

[25] [25]

In: 33rd USENIX Security Symposium (USENIX Security 24)

Liu, P., Liu, J., Fu, L., Lu, K., Xia, Y., Zhang, X., Chen, W., Weng, H., Ji, S., Wang, W.: Exploring{ChatGPT’s} capabilities on vulnerability management. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 811–828 (2024)

2024

[26] [26]

In: Findings of the Association for Computational Linguistics: ACL 2023

Liu, X., Tan, Y., Xiao, Z., Zhuge, J., Zhou, R.: Not the end of story: An eval- uation of chatgpt-driven vulnerability description mappings. In: Findings of the Association for Computational Linguistics: ACL 2023. pp. 3724–3731 (2023)

2023

[27] [27]

arXiv:2102.04664 (2021)

Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., et al.: Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 (2021)

Pith/arXiv arXiv 2021

[28] [28]

In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security

Luu, L., Chu, D.H., Olickel, H., Saxena, P., Hobor, A.: Making smart contracts smarter. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. pp. 254–269 (2016)

2016

[29] [29]

Proceedings of the ACM on Software Engineering1(FSE), 2355–2377 (2024)

Mai, Y., Gao, Z., Hu, X., Bao, L., Liu, Y., Sun, J.: Are human rules necessary? generating reusable apis with cot reasoning and in-context learning. Proceedings of the ACM on Software Engineering1(FSE), 2355–2377 (2024)

2024

[30] [30]

In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

Mai,Y.,Gao,Z.,Wang,H.,Bi,T.,Hu,X.,Xia,X.,Sun,J.:Towardsbetteranswers: Automated stack overflow post updating. In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). pp. 591–603. IEEE (2025)

2025

[31] [31]

https://owasp.org/www-project-smart-contract-top-10/

OWASP Foundation: OWASP smart contract top 10 - 2026. https://owasp.org/www-project-smart-contract-top-10/

2026

[32] [32]

IEEE Transactions on Software Engineering (2026) Empowering Automated Smart Contract Auditing via the GiAnt Corpus 21

Ruan, Y., Gao, Z., Chen, J., Bao, L., Yang, X.: Improving gas efficiency in smart contracts: Data-driven insights and llm-assisted remediation. IEEE Transactions on Software Engineering (2026) Empowering Automated Smart Contract Auditing via the GiAnt Corpus 21

2026

[33] [33]

In: 2024 IEEE Symposium on Security and Privacy (SP)

Sendner, C., Petzi, L., Stang, J., Dmitrienko, A.: Large-scale study of vulnerability scanners for ethereum smart contracts. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2273–2290. IEEE (2024)

2024

[34] [34]

https://solidityscan.com/web3hackhub?year=2025, accessed: 2026-03-02

SolidityScan: Web3HackHub: 2025 Web3 security incidents statistics. https://solidityscan.com/web3hackhub?year=2025, accessed: 2026-03-02

2025

[35] [35]

SolidityScan: Web3HackHub 2024 annual security report: Analyzing 149 incidents and $1.42b losses. Tech. rep., SolidityScan (2024), https://solidityscan.com/

2024

[36] [36]

arXiv:2401.16185 (2024)

Sun, Y., Wu, D., Xue, Y., Liu, H., Ma, W., Zhang, L., Liu, Y., Li, Y.: Llm4vuln: A unified evaluation framework for decoupling and enhancing llms’ vulnerability reasoning. arXiv:2401.16185 (2024)

arXiv 2024

[37] [37]

In: Proceedings of the IEEE/ACM 46th international conference on soft- ware engineering

Sun, Y., Wu, D., Xue, Y., Liu, H., Wang, H., Xu, Z., Xie, X., Liu, Y.: Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In: Proceedings of the IEEE/ACM 46th international conference on soft- ware engineering. pp. 1–13 (2024)

2024

[38] [38]

In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (2018)

Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., Vechev, M.: Securify: Practical security analysis of smart contracts. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (2018)

2018

[39] [39]

IEEE Transactions on Software Engineering50(11), 2732–2752 (2024)

Wang, H., Gao, Z., Hu, X., Lo, D., Grundy, J., Wang, X.: Just-in-time todo-missed commits detection. IEEE Transactions on Software Engineering50(11), 2732–2752 (2024)

2024

[40] [40]

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A., et al.: Experimentation in software engineering, vol. 236. Springer (2012)

2012

[41] [41]

Xia, B., Bi, T., Xing, Z., Lu, Q., Zhu, L.: An empirical study on software bill of materials: Where we stand and the road ahead (2023), https://arxiv.org/abs/2301.05362

arXiv 2023

[42] [42]

In: 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)

Xia, S., He, M., Song, L., Zhang, Y.: Sc-bench: A large-scale dataset for smart contract auditing. In: 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code). pp. 57–64 (2025). https://doi.org/10.1109/LLM4Code66737.2025.00012

work page doi:10.1109/llm4code66737.2025.00012 2025

[43] [43]

ACM Transactions on Software Engineering and Methodology34(3), 1–31 (2025)

Xiang, J., Gao, Z., Bao, L., Hu, X., Chen, J., Xia, X.: Automating comment genera- tion for smart contract from bytecode. ACM Transactions on Software Engineering and Methodology34(3), 1–31 (2025)

2025

[44] [44]

Xue, Z., Gao, Z., Wang, S., Hu, X., Xia, X., Li, S.: Selfpico: Self-guided partial codeexecutionwithllms.In:Proceedingsofthe33rdACMSIGSOFTInternational Symposium on Software Testing and Analysis. pp. 1389–1401 (2024)

2024

[45] [45]

ACM Transactions on Software Engineering and Methodology (2025)

Xue, Z., Zhang, X., Gao, Z., Hu, X., Gao, S., Xia, X., Li, S.: Clean code, better models: Enhancing llm performance with smell-cleaned dataset. ACM Transactions on Software Engineering and Methodology (2025)

2025

[46] [46]

In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Yan, D., Gao, Z., Liu, Z.: A closer look at different difficulty levels code gener- ation abilities of chatgpt. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 1887–1898. IEEE (2023)

2023

[47] [47]

arXiv:2202.11409 (2022)

Yashavant,C.S.,Kumar,S.,Karkare,A.:Scrawld:Adatasetofrealworldethereum smart contracts labelled with vulnerabilities. arXiv:2202.11409 (2022)

arXiv 2022

[48] [48]

ACM Transactions on Software Engineering and Methodology (2025)

Yu, J., Gao, Z., Bao, L., Liu, Z.: Enhancing domain-specific code completion via collaborative inference with large and small language models. ACM Transactions on Software Engineering and Methodology (2025)

2025

[49] [49]

In: 2020 IEEE international conference on software maintenance and evolution (ICSME)

Zhang, P., Xiao, F., Luo, X.: A framework and dataset for bugs in ethereum smart contracts. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). pp. 139–150. IEEE (2020)

2020

[50] [50]

arXiv preprint arXiv:1904.09675 (2019) 22 X

Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019) 22 X. Zhang et al

Pith/arXiv arXiv 1904

[51] [51]

In: 2023 IEEE/ACM 45th International Conference on Software Engi- neering (ICSE)

Zhang, Z., Zhang, B., Xu, W., Lin, Z.: Demystifying exploitable bugs in smart contracts. In: 2023 IEEE/ACM 45th International Conference on Software Engi- neering (ICSE). pp. 615–627. IEEE (2023)

2023

[52] [52]

IEEE Transactions on Software Engineering50(6) (2024)

Zheng, Z., Su, J., Chen, J., Lo, D., Zhong, Z., Ye, M.: Dappscan: Building large- scale datasets for smart contract weaknesses in dapp projects. IEEE Transactions on Software Engineering50(6) (2024). https://doi.org/10.1109/tse.2024.3383422

work page doi:10.1109/tse.2024.3383422 2024

[53] [53]

Future Gen- eration Computer Systems105, 475–491 (2020)

Zheng, Z., Xie, S., Dai, H.N., Chen, W., Chen, X., Weng, J., Imran, M.: An overview on smart contracts: Challenges, advances and platforms. Future Gen- eration Computer Systems105, 475–491 (2020)

2020