pith. machine review for the scientific record. sign in

arxiv: 2605.12563 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.PL

Recognition: unknown

OverrideFuzz: Semantic-Aware Grammar Fuzzing for Script-Runtime Vulnerabilities

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:56 UTC · model grok-4.3

classification 💻 cs.CR cs.PL
keywords grammar fuzzingscript runtimessemantic-aware generationoverride hooksvulnerability detectionPythonLuaJavaScript
0
0 comments X

The pith

OverrideFuzz uses two-phase semantic-aware grammar fuzzing to reach script runtime boundary behaviors that trigger known vulnerability patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OverrideFuzz to test script-language runtimes such as Python, Lua, and JavaScript by building objects that override methods in a declaration phase and then routing operations through those hooks in an execution phase. Active reflection tracks runtime types while passive reflection from error messages prunes invalid shapes, letting the generator approach semantic correctness without hand-written API rules. This targets dynamic rebinding and attribute resolution that standard grammar fuzzers miss, behaviors that can expose use-after-free and type-confusion issues. Evaluation shows steady coverage growth on all three targets, with Lua gaining the most from its metamethod mechanism, and the final corpus contains inputs that match known vulnerability patterns.

Core claim

OverrideFuzz is a two-phase semantic-aware grammar fuzzer whose declaration phase constructs objects with overriding methods and whose execution phase generates operations routed through those hooks, using active reflection to track runtime types and passive reflection from error messages to remove invalid operation shapes, so that generation reaches the script-native boundary behaviors that can trigger use-after-free or type-confusion bugs; on CPython, Lua, and QuickJS the approach produces consistent coverage growth and a corpus that reconstructs inputs matching known vulnerability patterns.

What carries the argument

Two-phase declaration and execution process with active type tracking and passive error-message reflection to filter operation shapes while preserving boundary-triggering ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Longer runs could surface previously unknown vulnerabilities once the corpus stabilizes.
  • The same declaration-plus-execution pattern could transfer to other dynamic languages that expose metamethods or prototype overrides.
  • Coverage plateaus suggest that adding explicit modeling of object lifetime or garbage-collection hooks would be a direct next step.

Load-bearing premise

That error messages supply enough detail to discard only invalid shapes without removing all operation shapes that could actually trigger boundary bugs, and that the short evaluation window is long enough to reveal the fuzzer's reach.

What would settle it

Extend the evaluation window on CPython, Lua, or QuickJS and check whether any newly generated inputs trigger use-after-free or type-confusion bugs that were not found by prior grammar or reflection-based fuzzers.

Figures

Figures reproduced from arXiv: 2605.12563 by Yiran Qiu.

Figure 1
Figure 1. Figure 1: Architecture of OverrideFuzz [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CPython coverage growth over time [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Lua coverage growth over time [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: QuickJS coverage growth over time 22 [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: CPython per-file branches coverage treemap [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: QuickJS per-file branches coverage treemap [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
read the original abstract

Script-language runtimes such as Python, Lua, and JavaScript are widely deployed in security sensitive contexts, yet they remain difficult to test because valid inputs must satisfy syntax, dynamic type constraints, and object-level semantics. Existing grammar and reflection-based fuzzers improve syntactic validity and interface reachability, but they rarely model override hooks, dynamic rebinding, and attribute-resolution behavior that can redirect built-in operations across the script-native boundary and trigger use-after-free or type-confusion bugs. We present OverrideFuzz, a two-phase, semantic-aware grammar fuzzer for script-language runtimes. Its declaration phase constructs objects with overriding methods, while its execution phase generates operations that route through those hooks. Active reflection tracks runtime types, and passive reflection learns from error messages to remove invalid operation shapes, allowing generation to approach semantic correctness without manual API specification. We evaluate OverrideFuzz on CPython, Lua, and QuickJS. All three targets show consistent coverage growth, with rapid early expansion followed by slower incremental gains, and Lua benefits most from its pervasive metamethod dispatch mechanism. Although OverrideFuzz did not discover novel vulnerabilities during the bounded evaluation period, corpus analysis shows that it reconstructs inputs matching known vulnerability patterns, which suggests that semantic-aware generation reaches the intended script-native boundary behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. OverrideFuzz is a two-phase semantic-aware grammar fuzzer for script-language runtimes (CPython, Lua, QuickJS) that constructs objects with overriding methods in a declaration phase and generates operations routed through those hooks in an execution phase. Active reflection tracks runtime types while passive reflection learns from error messages to prune invalid operation shapes, enabling semantic correctness without manual API specifications. Evaluation reports consistent coverage growth across targets (with Lua benefiting most from metamethod dispatch) and corpus analysis showing reconstruction of inputs that match known vulnerability patterns, although no novel vulnerabilities were discovered during the bounded evaluation period.

Significance. If the two-phase approach with passive reflection successfully models override hooks, dynamic rebinding, and attribute resolution to reach script-native boundary behaviors, it would advance automated testing of widely deployed script runtimes for use-after-free and type-confusion issues. The reported reconstruction of known vulnerability patterns provides concrete evidence that generation reaches intended semantic boundaries, and the absence of free parameters or fitted constants in the core mechanism is a strength; however, the lack of new discoveries and limited analysis of pruning effects reduce the demonstrated security impact.

major comments (2)
  1. [Abstract] Abstract: the central claim that passive reflection removes only invalid shapes while preserving all operation shapes capable of triggering boundary bugs is not supported by any reported analysis of pruned shapes or ablation that re-inserts them to measure lost coverage of boundary behaviors; this assumption is load-bearing for the reachability argument.
  2. [Evaluation] Evaluation (implied in abstract description of bounded runs): the corpus analysis reconstructs inputs matching known patterns, but the short bounded evaluation found no new vulnerabilities and provides no quantitative detail on how error-based pruning avoids false negatives on shapes that would trigger use-after-free or type-confusion when overriding methods are active.
minor comments (1)
  1. The description of coverage growth as 'rapid early expansion followed by slower incremental gains' would be strengthened by explicit reference to the time-series data or plots that support this characterization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on OverrideFuzz. We address each major comment below, clarifying the design rationale for passive reflection and the scope of our evaluation claims. We commit to revisions that strengthen the presentation without overstating results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that passive reflection removes only invalid shapes while preserving all operation shapes capable of triggering boundary bugs is not supported by any reported analysis of pruned shapes or ablation that re-inserts them to measure lost coverage of boundary behaviors; this assumption is load-bearing for the reachability argument.

    Authors: We agree that an explicit analysis of pruned shapes and an ablation re-inserting them would provide stronger support for the reachability claim. Passive reflection prunes solely on the basis of runtime error messages that signal early invalidity (e.g., type or attribute errors) before native dispatch occurs. By construction, any shape that successfully invokes an overriding method and reaches a use-after-free or type-confusion boundary would not emit such an error and therefore would not be pruned. We did not perform the ablation because of the prohibitive cost of re-running full campaigns with unpruned shape sets. We will revise the abstract and add a limitations paragraph that states this assumption explicitly. revision: partial

  2. Referee: [Evaluation] Evaluation (implied in abstract description of bounded runs): the corpus analysis reconstructs inputs matching known patterns, but the short bounded evaluation found no new vulnerabilities and provides no quantitative detail on how error-based pruning avoids false negatives on shapes that would trigger use-after-free or type-confusion when overriding methods are active.

    Authors: The manuscript already states that no new vulnerabilities were found in the bounded window. The corpus reconstruction of known vulnerability patterns supplies concrete evidence that the two-phase generation reaches the intended semantic boundaries. Error-based pruning cannot produce false negatives on bug-triggering shapes because those shapes execute the override hooks without generating the pruning errors. We lack quantitative false-negative measurements because constructing an oracle for all possible boundary-triggering shapes is infeasible. We will expand the evaluation section with further description of the pruning process, its effect on corpus validity rates, and coverage curves with and without passive reflection where feasible. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical observations only

full rationale

The paper presents OverrideFuzz as a two-phase grammar fuzzer whose claims rest on experimental coverage measurements and corpus reconstruction of known vulnerability patterns. No equations, fitted parameters, or derivations are introduced that reduce to the inputs by construction. Passive reflection is described as a heuristic for pruning invalid shapes, but this is presented as an engineering choice whose effectiveness is evaluated empirically rather than proven via self-referential logic or self-citation chains. The central suggestion that semantic-aware generation reaches boundary behaviors is tied directly to observable corpus matches, not to any renamed known result or ansatz smuggled through prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on domain assumptions about runtime reflection rather than new entities or fitted parameters; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Active and passive reflection on runtime types and error messages can approach semantic correctness without manual API specifications.
    This underpins both the declaration and execution phases and the claim that generation reaches script-native boundaries.

pith-pipeline@v0.9.0 · 5522 in / 1275 out tokens · 45472 ms · 2026-05-14T20:56:07.303541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Integrating formal methods and automated tools for DO-178C compliance in UA V software,

    R. Zrelli et al., “Integrating formal methods and automated tools for DO-178C compliance in UA V software,” Information and Software Technology, vol. 194, p. 108068, 2026, doi: https://doi.org/10.1016/j.infsof.2026.108068

  2. [2]

    Clang Static Analyzer

    LLVM Project, “Clang Static Analyzer.” Accessed: Apr. 20, 2026. [Online]. Available: https://clang-analyzer.llvm.org/

  3. [3]

    SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis,

    Y. Shoshitaishvili et al., “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis,” in 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 138–157. doi: 10.1109/SP.2016.17

  4. [4]

    FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities,

    S. Groß, S. Koch, L. Bernhard, T. Holz, and M. Johns, “FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities,” in Proceedings 2023 Network and Distributed System Security Symposium, San Diego, CA, USA: Internet Society, 2023. doi: 10.14722/ndss.2023.24290

  5. [5]

    NAUTILUS: Fishing for Deep Bugs with Grammars,

    C. Aschermann, T. Holz, P. Jauernig, A.-R. Sadeghi, and D. Teuchert, “NAUTILUS: Fishing for Deep Bugs with Grammars,” in Proceedings 2019 Network and Distributed System Secu- rity Symposium, San Diego, CA: Internet Society, 2019. doi: 10.14722/ndss.2019.23412

  6. [6]

    One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,

    Y. Chen et al., “One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,” in 2021 IEEE Symposium on Security and Privacy (SP) , 2021, pp. 642–658. doi: 10.1109/SP40001.2021.00071

  7. [7]

    PatchFuzz: Patch Fuzzing for JavaScript Engines,

    J. Wang, Z. Xie, X. Xie, X. Du, and X. Zhang, “PatchFuzz: Patch Fuzzing for JavaScript Engines,” Information and Software Technology , vol. 194, p. 108087, June 2026, doi: 10.1016/j.infsof.2026.108087

  8. [8]

    REFLECTA: Reflection-based Scalable and Semantic Scripting Language Fuzzing,

    C. Zhang, G. Lee, Q. Liu, and M. Payer, “REFLECTA: Reflection-based Scalable and Semantic Scripting Language Fuzzing,” in Proceedings ASIA CCS '25 , Hanoi, Vietnam,

  9. [9]

    doi: 10.1145/3708821.3710818

  10. [10]

    AFL++ : Combining Incremental Steps of Fuzzing Research,

    A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” in 14th USENIX Workshop on Offensive Technologies (WOOT 29 20), USENIX Association, Aug. 2020. [Online]. Available: https://www.usenix.org/ conference/woot20/presentation/fioraldi

  11. [11]

    Superion: Grammar-Aware Greybox Fuzzing,

    J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-Aware Greybox Fuzzing,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) , 2019, pp. 724–735. doi: 10.1109/ICSE.2019.00081

  12. [12]

    Python interpreter fuzzing using AST-base mutators, based on LibFuzzer

    Y. Qiu, “Python interpreter fuzzing using AST-base mutators, based on LibFuzzer.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/Nambers/CPython-AST- Fuzzer

  13. [13]

    UAF when writing to a bytearray with an element implementing __index__ with side-ef- fects

    “UAF when writing to a bytearray with an element implementing __index__ with side-ef- fects.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/ issues/91153

  14. [14]

    There is a way to access an underlying mapping in MappingProxyType

    “There is a way to access an underlying mapping in MappingProxyType.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/issues/88004

  15. [15]

    The Zephyr abstract syntax description language,

    D. C. Wang, A. W. Appel, J. L. Korn, and C. S. Serra, “The Zephyr abstract syntax description language,” in Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997 , in DSL'97. Santa Barbara, California: USENIX Association, 1997, p. 17

  16. [16]

    SanitizerCoverage

    LLVM Project, “SanitizerCoverage.” Accessed: Apr. 20, 2026. [Online]. Available: https:// clang.llvm.org/docs/SanitizerCoverage.html

  17. [17]

    Nix & NixOS | Declarative builds and deployments

    NixOS contributors, “Nix & NixOS | Declarative builds and deployments..” Accessed: Apr. 20, 2026. [Online]. Available: https://nixos.org/

  18. [18]

    release 3.14.3 python/cpython

    Python Software Foundation, “release 3.14.3 python/cpython.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/releases/tag/v3.14.3

  19. [19]

    Lua: version history

    Lua Team, “Lua: version history.” Accessed: Apr. 20, 2026. [Online]. Available: https:// www.lua.org/versions.html#5.5

  20. [20]

    QuickJS binary releases

    F. Bellard, “QuickJS binary releases.” Accessed: Apr. 20, 2026. [Online]. Available: https:// bellard.org/quickjs/binary_releases/ 30

  21. [21]

    ECMAScript 2023 Language Specification

    Ecma International, “ECMAScript 2023 Language Specification.” Accessed: Apr. 20, 2026. [Online]. Available: https://tc39.es/ecma262/2023/

  22. [22]

    Evaluating and mitigating the growing risk of LLM-discovered 0-days

    N. Carlini et al., “Evaluating and mitigating the growing risk of LLM-discovered 0-days.” Accessed: Apr. 20, 2026. [Online]. Available: https://red.anthropic.com/2026/zero-days/

  23. [23]

    SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,

    C. Zhang et al., “SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,” no. arXiv:2602.07666. arXiv, Feb. 2026. doi: 10.48550/ arXiv.2602.07666. 31 APPENDIX A Override Functions Reference The table below shows representative examples of override functions discovered by OverrideFuzz’s active reflection phase fo...