pith. machine review for the scientific record. sign in

arxiv: 2605.09051 · v1 · submitted 2026-05-09 · 💻 cs.SE

Recognition: no theorem link

ParityFuzz: Finding Inconsistencies across Solidity Compilers via Fine-Grained Mutation and Differential Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:26 UTC · model grok-4.3

classification 💻 cs.SE
keywords Solidity compilersdifferential testingfuzzingsmart contractsinconsistency detectionmutation rulesreinforcement learning
0
0 comments X

The pith

ParityFuzz identifies inconsistencies among Solidity compilers by mutating contracts and comparing normalized outputs across different environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a testing approach for Solidity smart contracts that checks whether different compilers produce the same results for the same code. Inconsistencies can block contract migration between platforms, confuse debugging, and open security holes that lead to financial losses. ParityFuzz first analyzes compilers to create mutation rules focused on syntax and boundary values, then applies reinforcement learning to pick mutations that produce useful test cases. It compiles and executes the mutated contracts on multiple compilers in their own environments, normalizes the results, and flags differences as inconsistencies. Evaluation on six compilers shows the method reaches higher success rates and coverage than earlier fuzzers while finding 64 new inconsistencies, some already fixed by developers.

Core claim

ParityFuzz operates in three stages: deriving syntax- and boundary-oriented mutation rules by examining compilers and execution environments, using reinforcement learning to choose effective mutations for generating test programs, and compiling plus executing those programs across multiple compilers followed by normalization and comparison to locate inconsistencies.

What carries the argument

Fine-grained mutation rules combined with reinforcement learning selection and cross-environment differential analysis after result normalization.

If this is right

  • Up to 18 times higher compilation success rate than prior fuzzers.
  • 1.8 times higher code coverage than state-of-the-art tools.
  • Detection of 64 previously unknown inconsistencies across six compilers.
  • Eleven issues have been fixed in the affected compilers.
  • Findings have produced a bounty from the Polkadot community.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mutation-plus-normalization pattern could be adapted to test consistency in other languages that have several independent compilers.
  • Inconsistencies found this way may indicate deeper semantic mismatches that could be turned into exploits if left unaddressed.
  • Developers migrating contracts should add cross-compiler checks as a standard step before deployment.

Load-bearing premise

The normalization process and cross-environment comparisons separate real compiler differences from artifacts introduced by the testing setup or normalization rules themselves.

What would settle it

Re-running the same mutated contracts in a single shared execution environment and checking whether the reported inconsistencies remain or disappear would show if they stem from genuine compiler behavior.

Figures

Figures reproduced from arXiv: 2605.09051 by Bowei Su, Mingxi Ye, Peilin Zheng, Yuhong Na, Zibin Zheng.

Figure 1
Figure 1. Figure 1: The compilation and execution process under different Solidity compilers. of ParityFuzz, followed by implementation details in Sec￾tion 5. Section 6 presents a comprehensive evaluation. Sec￾tion 7 discusses threats to validity and limitations. Section 8 reviews related work, and Section 9 concludes the paper. 2. Background and Motivation 2.1. Solidity Compilers [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: illustrates a case of compilation status inconsis￾tency, where Solc accepts the program while Solang reports a compilation error. This compilation error originates in the storage_align of Solang. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The source code of the Solang compiler that handles struct [3, 22, 14] are relatively coarse-grained, making it difficult to generate syntactically diverse programs. Specifically, POLYGLOT[3] inserts, deletes, and replaces elements in the program’s intermediate representation (IR). Fuzzol[22] modifies the nodes of the program’s AST. afl-compiler￾fuzzer[14] disassembles programs and then reassembles them. T… view at source ↗
Figure 4
Figure 4. Figure 4: Workflow of ParityFuzz. or a loop) for compiler bug isolation. DFUZZ [54] extracts edge cases from DL library APIs to guide program mutation. These fuzzers’ mutation strategies focus on domain-specific problems rather than compiler inconsistencies, and therefore also lack boundary-orientation. Difficulty in Identifying Inconsistencies. Tools [3, 14, 22] only support the detection of compilation error bugs.… view at source ↗
Figure 5
Figure 5. Figure 5: First, once the code blocks containing boundary conditions are obtained, ParityFuzz identifies the program features that can trigger these code blocks and generate simple mutation rules for each feature (syntax-oriented rules). Build￾ing on these features and syntax-oriented rules, ParityFuzz then creates more complex mutation rules (boundary-oriented rules). Source Code Prompt Boundary Conditions Prompt P… view at source ↗
Figure 7
Figure 7. Figure 7: illustrates the prompt used for generating the boundary-oriented mutation rule. In the prompt, the first task asks the LLM to analyze how the current boundary￾related feature can trigger the specified boundary condition. The second task asks the LLM to either select from existing syntax-oriented rules or synthesize new ones based on them. After obtaining the rules, we need to categorize them. We use the fe… view at source ↗
Figure 8
Figure 8. Figure 8: Prompt for selecting features and mutation rule. 4.2. Fine-Grained Mutation After generating the mutation rules, we select the appro￾priate ones to apply mutations to the program. The process can be divided into three stages, namely rule selection, rule application, and selection optimization. During rule selection, ParityFuzz selects the most suitable mutation rules for each seed program. During rule appl… view at source ↗
Figure 9
Figure 9. Figure 9: Prompt for mutating and repairing program. in Figure 9a). If the generated program fails to compile with Solc, ParityFuzz provides the error message to the LLM to repair the program (shown in Figure 9b). While the repair process may alter the program’s original semantics, it helps uncover more cross-compiler inconsis￾tencies. Programs generally need to be compilable with Solc in order to potentially trigge… view at source ↗
Figure 10
Figure 10. Figure 10: The original version of the program in [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: An example of execution status inconsistency. The executor of Solc runs successfully, while the executor of Revive fails. 1fn delegate_call (...) -> ... { 2 // ... 3 let code_hash = ContractInfoOf :: <T >:: get (& address ) 4 . ok_or ( Error :: <T >:: CodeNotFound ) 5 .map (|c| c. code_hash ) ?; 6 let executable = E:: from_storage ( code_hash , self . gas_meter_mut () ) ?; 7 // ... 8} [PITH_FULL_IMAGE:fi… view at source ↗
Figure 12
Figure 12. Figure 12: The source code of the executor in Revive that handles delegatecall deleted, potentially removing all users’ balance records. As a result, the balances of other users may be lost, demonstrating a serious risk in financial applications. RQ1: ParityFuzz has found 64 inconsistencies across six Solidity compilers, 27 of which have been con￾firmed, and 11 of which have been fixed due to our Git issues. Notably… view at source ↗
Figure 17
Figure 17. Figure 17: The quality of 1000 programs generated by different fuzzers. As shown in [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗
Figure 15
Figure 15. Figure 15: The original version of the program in [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: An example of error message inconsistency. Solc produces a compilation error without a clear error message, while Sold and Solang provide clear error messages. Quality of Generated Program. Most inconsistency￾triggering programs can be successfully compiled by Solc. Cases like the one shown in [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: The original version of the program in [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: An example of compilation status inconsistency. Compilation error with Solang vs success with Solc. using the LLM to directly analyze code blocks that contain boundary conditions, with no extra prompt information(i.e., specific boundary condition code, program feature, and syntax-oriented rules). To be compatible with the selection strategy of ParityFuzz, we classify direct rules using the LLM. This allow… view at source ↗
read the original abstract

The Solidity smart contract ecosystem has rapidly grown, leading to multiple compilers targeting different blockchain platforms or improving compilation efficiency. Although many compilers aim to be compatible with the primary Solidity compiler (Solc), significant inconsistencies in compilation and execution remain. These inconsistencies hinder contract migration, mislead developers during debugging, and may introduce exploitable vulnerabilities, causing financial losses. Existing testing techniques mainly focus on bugs within a single compiler or perform differential testing in the same execution environment. However, they are insufficient for detecting cross-compiler inconsistencies, as they lack mechanisms to explore triggering conditions and compare bytecode across environments. We propose ParityFuzz, a cross-compiler differential testing framework for Solidity. It operates in three stages. First, it derives mutation rules, including syntax- and boundary-oriented rules, by analyzing compilers and execution environments. Second, it uses reinforcement learning to select effective mutation rules for test generation. Third, it compiles and executes programs across multiple compilers, then normalizes and compares results to detect inconsistencies. Our evaluation shows ParityFuzz is efficient and effective. It achieves up to 18x higher compilation success rate and 1.8x higher code coverage than state-of-the-art fuzzers. It uncovers 64 previously unknown inconsistencies across six compilers. Notably, 11 issues have been fixed, and our findings received a bounty from the Polkadot community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents ParityFuzz, a three-stage cross-compiler differential testing framework for Solidity. Stage 1 derives syntax- and boundary-oriented mutation rules by analyzing compilers and execution environments; Stage 2 applies reinforcement learning to select effective rules for generating test programs; Stage 3 compiles and executes the programs across six compilers, normalizes the outputs, and compares them to detect inconsistencies. Evaluation reports up to 18x higher compilation success rate and 1.8x higher code coverage than prior fuzzers, plus discovery of 64 previously unknown inconsistencies (11 fixed, with a Polkadot bounty).

Significance. If the normalization and comparison reliably isolate true semantic inconsistencies rather than harness or platform artifacts, the work would be a useful contribution to compiler compatibility and security in the Solidity ecosystem. The practical outcomes (fixes and bounty) provide concrete evidence of impact. However, the empirical claims rest on an under-specified comparison step whose soundness directly determines whether the 64 findings and performance gains are trustworthy.

major comments (2)
  1. [Evaluation] Evaluation section: the normalization rules and cross-environment comparison procedure (the core of Stage 3) receive no concrete description, pseudocode, or examples. No ablation study is reported that shows how altering or removing the normalization changes the count of 64 inconsistencies, and no independent validation (manual review, re-execution oracle, or false-positive rate) is provided. This directly undermines the central claim that the detected differences are previously unknown semantic inconsistencies rather than artifacts of the testing harness, gas accounting, memory layout, or EVM/WASM differences.
  2. [Approach] Approach, third stage: the exact comparison metric used to declare an inconsistency (e.g., bytecode equality after normalization, execution trace equivalence, or output equality) is not stated, nor is any mechanism described for controlling false positives arising from environment setup. Without these details the reported 18x success-rate and 1.8x coverage gains cannot be assessed or reproduced.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'up to 18x higher compilation success rate' is not accompanied by the identity of the baseline fuzzer or the precise experimental conditions, making the headline claim difficult to interpret at a glance.
  2. The paper would benefit from a small table or figure illustrating one concrete normalization rule and the before/after effect on a pair of compiler outputs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the normalization rules and cross-environment comparison procedure (the core of Stage 3) receive no concrete description, pseudocode, or examples. No ablation study is reported that shows how altering or removing the normalization changes the count of 64 inconsistencies, and no independent validation (manual review, re-execution oracle, or false-positive rate) is provided. This directly undermines the central claim that the detected differences are previously unknown semantic inconsistencies rather than artifacts of the testing harness, gas accounting, memory layout, or EVM/WASM differences.

    Authors: We agree that the original manuscript lacked sufficient detail on the normalization rules and cross-environment comparison procedure in Stage 3. In the revised manuscript, we will add a dedicated subsection in the Evaluation section providing a concrete description of the normalization rules (including handling of gas accounting, memory layout, and platform-specific artifacts), pseudocode for the comparison logic, and examples of programs and their normalized outputs. We will also include an ablation study quantifying the impact of normalization on the number of reported inconsistencies and report the results of a manual review of a random sample of the 64 inconsistencies, along with a false-positive rate estimate derived from re-execution oracles. These additions will directly address concerns about whether the differences represent true semantic inconsistencies. revision: yes

  2. Referee: [Approach] Approach, third stage: the exact comparison metric used to declare an inconsistency (e.g., bytecode equality after normalization, execution trace equivalence, or output equality) is not stated, nor is any mechanism described for controlling false positives arising from environment setup. Without these details the reported 18x success-rate and 1.8x coverage gains cannot be assessed or reproduced.

    Authors: The comparison metric used in Stage 3 is normalized output equality: after compilation and execution, we compare return values, emitted events, and final contract state across compilers once environment-specific elements (gas costs, absolute memory addresses, and EVM/WASM layout differences) have been stripped via normalization. We will explicitly state this metric and describe the false-positive controls (standardized harness, multiple re-executions per program, and filtering of transient differences) in the revised Approach section. These clarifications will allow readers to assess and reproduce both the inconsistency detections and the reported performance improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical tool paper with no derivation chain

full rationale

The paper describes a three-stage empirical fuzzing framework (mutation rule derivation from analysis, RL-based selection, and cross-compiler compilation/execution with normalization for differential comparison). No equations, predictions, uniqueness theorems, or ansatzes are presented that reduce by construction to fitted parameters, self-definitions, or self-citation chains. Results (64 inconsistencies, 18x/1.8x gains) are reported as experimental outcomes rather than tautological outputs of internal definitions. Normalization and comparison steps are methodological choices whose validity can be externally validated or falsified; they do not create circularity in any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described. Mutation rules and RL reward function are implied but not detailed enough to enumerate.

pith-pipeline@v0.9.0 · 5558 in / 1079 out tokens · 36317 ms · 2026-05-12T02:26:39.797480+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 2 internal anchors

  1. [1]

    Jit-picking: Differential fuzzing of javascript engines, in: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp

    Bernhard, L., Scharnowski, T., Schloegel, M., Blazytko, T., Holz, T., 2022. Jit-picking: Differential fuzzing of javascript engines, in: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp. 351–364

  2. [2]

    Coverage-directed differentialtestingofjvmimplementations,in:proceedingsofthe37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp

    Chen, Y., Su, T., Sun, C., Su, Z., Zhao, J., 2016. Coverage-directed differentialtestingofjvmimplementations,in:proceedingsofthe37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 85–99

  3. [3]

    Chen, Y., Zhong, R., Hu, H., Zhang, H., Yang, Y., Wu, D., Lee, W.,

  4. [4]

    One engine to fuzz’em all: Generic language processor testing with semantic validation, in: 2021 IEEE Symposium on Security and Privacy (SP), IEEE. pp. 642–658

  5. [5]

    Oss-fuzz

    Collective, A., 2024. Oss-fuzz. https://github.com/ethereum/ solidity/tree/develop/test/tools/ossfuzz

  6. [6]

    Solc.https://github.com/ethereum/solidity

    Collective, A., 2026a. Solc.https://github.com/ethereum/solidity

  7. [7]

    Solidity history bugs

    Collective, A., 2026b. Solidity history bugs. https://github.com/ ethereum/solidity/blob/develop/docs/bugs.json

  8. [8]

    Solidity test programs.https://github.com/ ethereum/solidity/tree/develop/test

    Collective, A., 2026c. Solidity test programs.https://github.com/ ethereum/solidity/tree/develop/test

  9. [9]

    Introducing solar.https://www.paradigm.xyz/ 2024/11/solar

    DaniPopes, G.K., 2024. Introducing solar.https://www.paradigm.xyz/ 2024/11/solar

  10. [10]

    Defillama-defidashboard

    DefiLlama,2025. Defillama-defidashboard. https://defillama.com/

  11. [11]

    Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt

    Deng, Y., Xia, C.S., Yang, C., Zhang, S.D., Yang, S., Zhang, L., 2023. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv preprint arXiv:2304.02014

  12. [12]

    Eom, J., Jeong, S., Kwon, T., 2024. Fuzzing javascript interpreters with coverage-guided reinforcement learning for llm-based mutation, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 1656–1668

  13. [13]

    Grayc: Greybox fuzzing of compilers and analysers for c, in: Pro- ceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp

    Even-Mendoza, K., Sharma, A., Donaldson, A.F., Cadar, C., 2023. Grayc: Greybox fuzzing of compilers and analysers for c, in: Pro- ceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 1219–1231

  14. [14]

    Protocol buffers.https://protobuf.dev/

    Google, 2024. Protocol buffers.https://protobuf.dev/

  15. [15]

    Making no-fuss compiler fuzzing effective, in: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction, pp

    Groce,A.,vanTonder,R.,Kalburgi,G.T.,LeGoues,C.,2022. Making no-fuss compiler fuzzing effective, in: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction, pp. 194–204

  16. [16]

    Fuzzilli: Fuzzing for javascript jit compiler vulnerabilities., in: NDSS

    Groß, S., Koch, S., Bernhard, L., Holz, T., Johns, M., 2023. Fuzzilli: Fuzzing for javascript jit compiler vulnerabilities., in: NDSS

  17. [17]

    ever-node.https://github.com/everx-labs/ever-node

    Labs, E., 2023. ever-node.https://github.com/everx-labs/ever-node

  18. [18]

    Differences between sold and solc

    Labs, E., 2024a. Differences between sold and solc. https:// github.com/everx-labs/TVM-Solidity-Compiler/blob/master/API.md

  19. [19]

    Sold.https://github.com/everx-labs/TVM-Solidity- Compiler

    Labs, E., 2025. Sold.https://github.com/everx-labs/TVM-Solidity- Compiler

  20. [20]

    era-test-node.https://github.com/matter-labs/era- test-node

    Labs, M., 2024b. era-test-node.https://github.com/matter-labs/era- test-node

  21. [21]

    Labs, M., 2026. Zksolc. https://github.com/matter-labs/era- compiler-solidity

  22. [22]

    Towards understanding the bugs in solidity compiler, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp

    Ma, H., Zhang, W., Shen, Q., Tian, Y., Chen, J., Cheung, S.C., 2024. Towards understanding the bugs in solidity compiler, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 1312–1324

  23. [23]

    Syntax-aware mutation for testing the solidity compiler, in: European SymposiumonResearchinComputerSecurity,Springer.pp.327–347

    Mitropoulos,C.,Sotiropoulos,T.,Ioannidis,S.,Mitropoulos,D.,2023. Syntax-aware mutation for testing the solidity compiler, in: European SymposiumonResearchinComputerSecurity,Springer.pp.327–347

  24. [24]

    NEAR, 2025. borsh. https://borsh.io/

  25. [25]

    Randir: differential testing for embedded compilers, in: Proceedings of the 2016 7th ACM SIGPLAN Symposium on Scala, pp

    Ofenbeck, G., Rompf, T., Püschel, M., 2016. Randir: differential testing for embedded compilers, in: Proceedings of the 2016 7th ACM SIGPLAN Symposium on Scala, pp. 21–30

  26. [26]

    Ou, X., Li, C., Jiang, Y., Xu, C., 2024. The mutators reloaded: Fuzzing compilers with large language model generated mutation operators, in: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, pp. 298–312

  27. [27]

    Solar.https://github.com/paradigmxyz/solar

    Paradigm, 2025. Solar.https://github.com/paradigmxyz/solar

  28. [28]

    Differences between revive and solc

    Polkadot, 2026. Differences between revive and solc. https://contracts.polkadot.io/revive_compiler/differences_ yul_translation/

  29. [29]

    Qwen-coder-plus

    Qwen, 2026a. Qwen-coder-plus. https://bailian.console.aliyun. com/?tab=model#/model-market/detail/qwen-coder-plus

  30. [30]

    Qwen2.5-coder-0.5b-instruct.https://huggingface.co/ Qwen/Qwen2.5-Coder-0.5B-Instruct

    Qwen, 2026b. Qwen2.5-coder-0.5b-instruct.https://huggingface.co/ Qwen/Qwen2.5-Coder-0.5B-Instruct

  31. [31]

    Qwen2.5-coder-7b-instruct.https://huggingface.co/ Qwen/Qwen2.5-Coder-7B-Instruct

    Qwen, 2026c. Qwen2.5-coder-7b-instruct.https://huggingface.co/ Qwen/Qwen2.5-Coder-7B-Instruct. Bowei Su et al.:Preprint submitted to Elsevier Page 14 of 15 Short Title of the Article

  32. [32]

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.,

  33. [33]

    Proximal Policy Optimization Algorithms

    Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  34. [34]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Shao,Z.,Wang,P.,Zhu,Q.,Xu,R.,Song,J.,Bi,X.,Zhang,H.,Zhang, M., Li, Y., Wu, Y., et al., 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300

  35. [35]

    Differences between solang and solc

    Solang, 2023. Differences between solang and solc. https:// solang.readthedocs.io/en/v0.3.3/targets/solana.html#solidity- for-solana-incompatibilities-with-solidity-for-ethereum

  36. [36]

    Solang.https://github.com/hyperledger-solang/ solang

    Solang, H., 2025a. Solang.https://github.com/hyperledger-solang/ solang

  37. [37]

    Test cases in solang

    Solang, H., 2025b. Test cases in solang. https://github.com/ hyperledger-solang/solang/tree/main/tests/solana_tests

  38. [38]

    Contract abi specification.https://docs.soliditylang

    Solidity, 2025. Contract abi specification.https://docs.soliditylang. org/en/latest/abi-spec.html

  39. [39]

    Parityfuzz.https://zenodo.org/records/19944888

    Su, B., 2025. Parityfuzz.https://zenodo.org/records/19944888

  40. [40]

    Decoding an empty tuple causes error in solang

    Subway2023, 2025a. Decoding an empty tuple causes error in solang. https://github.com/hyperledger-solang/solang/issues/1727

  41. [41]

    Deleting an array element causes the array to be deleted in solang.https://github.com/hyperledger-solang/solang/ issues/1785

    Subway2023, 2025b. Deleting an array element causes the array to be deleted in solang.https://github.com/hyperledger-solang/solang/ issues/1785

  42. [42]

    ripemd160 is unavailable in zksolc.https:// github.com/matter-labs/era-compiler-solidity/issues/275

    Subway2023, 2025c. ripemd160 is unavailable in zksolc.https:// github.com/matter-labs/era-compiler-solidity/issues/275

  43. [43]

    Zksolc default codegen problem

    Subway2023, 2025d. Zksolc default codegen problem. https:// github.com/matter-labs/era-compiler-solidity/issues/272

  44. [44]

    Finding and analyzing compiler warning defects, in: Proceedings of the 38th International Conference on Software Engineering, pp

    Sun, C., Le, V., Su, Z., 2016. Finding and analyzing compiler warning defects, in: Proceedings of the 38th International Conference on Software Engineering, pp. 203–213

  45. [45]

    Test cases in revive.https://github.com/ paritytech/revive/blob/main/crates/runner/src/lib.rs

    Technologies, P., 2025. Test cases in revive.https://github.com/ paritytech/revive/blob/main/crates/runner/src/lib.rs

  46. [46]

    Technologies, P., 2026. Revive. https://github.com/paritytech/ revive

  47. [47]

    Differential testing solidity compiler through deep contract manipulation and mutation

    Tian, Z., Wang, F., Chen, Y., Chen, L., 2024. Differential testing solidity compiler through deep contract manipulation and mutation

  48. [48]

    Detecting c++ compiler front-end bugs via grammar mutation and differential testing

    Tu,H.,Jiang,H.,Zhou,Z.,Tang,Y.,Ren,Z.,Qiao,L.,Jiang,L.,2022. Detecting c++ compiler front-end bugs via grammar mutation and differential testing. IEEE Transactions on Reliability 72, 343–357

  49. [49]

    Isolating compiler bugs by generating effective witness programs with large language models

    Tu, H., Zhou, Z., Jiang, H., Yusuf, I.N.B., Li, Y., Jiang, L., 2024. Isolating compiler bugs by generating effective witness programs with large language models. IEEE Transactions on Software Engineering

  50. [50]

    Superion: Grammar-aware greybox fuzzing, in: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE

    Wang, J., Chen, B., Wei, L., Liu, Y., 2019. Superion: Grammar-aware greybox fuzzing, in: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE. pp. 724–735

  51. [51]

    {FuzzJIT}:{Oracle-Enhanced} fuzzing for {JavaScript} engine {JIT} compiler, in: 32nd USENIX Security Symposium (USENIX Security 23), pp

    Wang, J., Zhang, Z., Liu, S., Du, X., Chen, J., 2023a. {FuzzJIT}:{Oracle-Enhanced} fuzzing for {JavaScript} engine {JIT} compiler, in: 32nd USENIX Security Symposium (USENIX Security 23), pp. 1865–1882

  52. [52]

    Rustlantis: Randomized differential testing of the rust compiler

    Wang, Q., Jung, R., 2024. Rustlantis: Randomized differential testing of the rust compiler. Proceedings of the ACM on Programming Languages 8, 1955–1981

  53. [53]

    Zero-config fuzzing for microservices, in: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE

    Wang, W., Benea, A., Ivancic, F., 2023b. Zero-config fuzzing for microservices, in: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE. pp. 1840–1845

  54. [54]

    A code knowledge graph-enhanced system for llm-based fuzz driver generation

    Xu, H., Ma, W., Zhou, T., Zhao, Y., Chen, K., Hu, Q., Liu, Y., Wang, H., 2024. A code knowledge graph-enhanced system for llm-based fuzz driver generation. arXiv preprint arXiv:2411.11532

  55. [55]

    White-box compiler fuzzing empowered by large language models

    Yang, C., Deng, Y., Lu, R., Yao, J., Liu, J., Jabbarvand, R., Zhang, L., 2023. White-box compiler fuzzing empowered by large language models. arXiv preprint arXiv:2310.15991

  56. [56]

    Your fix is my exploit: Enabling comprehensive dl library api fuzzing with large language models

    Zhang,K.,Wang,S.,Han,J., Zhu,X.,Li,X.,Wang,S.,Wen,S.,2025. Your fix is my exploit: Enabling comprehensive dl library api fuzzing with large language models. arXiv preprint arXiv:2501.04312

  57. [57]

    Differences between zksolc and solc

    ZKsync, 2026. Differences between zksolc and solc. https:// docs.zksync.io/zksync-protocol/differences/evm-instructions. Bowei Su et al.:Preprint submitted to Elsevier Page 15 of 15