arxiv: 2605.12563 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.PL

Recognition: unknown

OverrideFuzz: Semantic-Aware Grammar Fuzzing for Script-Runtime Vulnerabilities

Yiran Qiu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:56 UTC · model grok-4.3

classification 💻 cs.CR cs.PL

keywords grammar fuzzingscript runtimessemantic-aware generationoverride hooksvulnerability detectionPythonLuaJavaScript

0 comments

The pith

OverrideFuzz uses two-phase semantic-aware grammar fuzzing to reach script runtime boundary behaviors that trigger known vulnerability patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OverrideFuzz to test script-language runtimes such as Python, Lua, and JavaScript by building objects that override methods in a declaration phase and then routing operations through those hooks in an execution phase. Active reflection tracks runtime types while passive reflection from error messages prunes invalid shapes, letting the generator approach semantic correctness without hand-written API rules. This targets dynamic rebinding and attribute resolution that standard grammar fuzzers miss, behaviors that can expose use-after-free and type-confusion issues. Evaluation shows steady coverage growth on all three targets, with Lua gaining the most from its metamethod mechanism, and the final corpus contains inputs that match known vulnerability patterns.

Core claim

OverrideFuzz is a two-phase semantic-aware grammar fuzzer whose declaration phase constructs objects with overriding methods and whose execution phase generates operations routed through those hooks, using active reflection to track runtime types and passive reflection from error messages to remove invalid operation shapes, so that generation reaches the script-native boundary behaviors that can trigger use-after-free or type-confusion bugs; on CPython, Lua, and QuickJS the approach produces consistent coverage growth and a corpus that reconstructs inputs matching known vulnerability patterns.

What carries the argument

Two-phase declaration and execution process with active type tracking and passive error-message reflection to filter operation shapes while preserving boundary-triggering ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Longer runs could surface previously unknown vulnerabilities once the corpus stabilizes.
The same declaration-plus-execution pattern could transfer to other dynamic languages that expose metamethods or prototype overrides.
Coverage plateaus suggest that adding explicit modeling of object lifetime or garbage-collection hooks would be a direct next step.

Load-bearing premise

That error messages supply enough detail to discard only invalid shapes without removing all operation shapes that could actually trigger boundary bugs, and that the short evaluation window is long enough to reveal the fuzzer's reach.

What would settle it

Extend the evaluation window on CPython, Lua, or QuickJS and check whether any newly generated inputs trigger use-after-free or type-confusion bugs that were not found by prior grammar or reflection-based fuzzers.

Figures

Figures reproduced from arXiv: 2605.12563 by Yiran Qiu.

**Figure 2.** Figure 2: CPython coverage growth over time [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

**Figure 3.** Figure 3: Lua coverage growth over time [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

**Figure 4.** Figure 4: QuickJS coverage growth over time 22 [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: CPython per-file branches coverage treemap [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 7.** Figure 7: QuickJS per-file branches coverage treemap [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

read the original abstract

Script-language runtimes such as Python, Lua, and JavaScript are widely deployed in security sensitive contexts, yet they remain difficult to test because valid inputs must satisfy syntax, dynamic type constraints, and object-level semantics. Existing grammar and reflection-based fuzzers improve syntactic validity and interface reachability, but they rarely model override hooks, dynamic rebinding, and attribute-resolution behavior that can redirect built-in operations across the script-native boundary and trigger use-after-free or type-confusion bugs. We present OverrideFuzz, a two-phase, semantic-aware grammar fuzzer for script-language runtimes. Its declaration phase constructs objects with overriding methods, while its execution phase generates operations that route through those hooks. Active reflection tracks runtime types, and passive reflection learns from error messages to remove invalid operation shapes, allowing generation to approach semantic correctness without manual API specification. We evaluate OverrideFuzz on CPython, Lua, and QuickJS. All three targets show consistent coverage growth, with rapid early expansion followed by slower incremental gains, and Lua benefits most from its pervasive metamethod dispatch mechanism. Although OverrideFuzz did not discover novel vulnerabilities during the bounded evaluation period, corpus analysis shows that it reconstructs inputs matching known vulnerability patterns, which suggests that semantic-aware generation reaches the intended script-native boundary behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OverrideFuzz shows a workable two-phase design for generating override-aware inputs in script runtimes and gets measurable coverage, but the evaluation does not yet demonstrate it reliably finds new boundary bugs.

read the letter

The main takeaway is that this paper builds OverrideFuzz around a declaration phase that sets up overriding objects and an execution phase that routes operations through them, using active reflection for types and passive reflection from error messages to drop invalid shapes. That combination lets it target override hooks and attribute resolution without manual API specs, which is the concrete step past standard grammar fuzzers. On CPython, Lua, and QuickJS it produces consistent coverage growth, with Lua benefiting most from its metamethod dispatch, and the corpus recovers inputs that match known vulnerability patterns. Those results are the solid part of the work. The evaluation stays within a bounded window and reports no new vulnerabilities, so the claim that semantic-aware generation reaches the intended script-native boundaries rests on coverage curves and pattern matching rather than direct bug discovery. The passive reflection step assumes error messages cleanly separate bad shapes from all shapes that could trigger use-after-free or type-confusion once the overrides are active. If an error appears only because the reflection phase lacks the exact state built in the declaration phase, or if the error is itself a side effect of the boundary crossing, then pruning could remove the very sequences needed. No section lists the pruned shapes or runs an ablation that re-inserts them to check lost coverage of boundary behaviors. This paper is aimed at researchers who build or tune fuzzers for language interpreters and need practical ways to handle dynamic object semantics. A reader working on runtime security testing would find the implementation choices and the three-target coverage data useful. It deserves a serious referee because the core mechanism is implemented, the targets are real, and the coverage results are reported with enough detail to discuss. I would send it for review and ask specifically for more evidence on the pruning step and whether longer runs or different seed strategies surface new issues.

Referee Report

2 major / 1 minor

Summary. OverrideFuzz is a two-phase semantic-aware grammar fuzzer for script-language runtimes (CPython, Lua, QuickJS) that constructs objects with overriding methods in a declaration phase and generates operations routed through those hooks in an execution phase. Active reflection tracks runtime types while passive reflection learns from error messages to prune invalid operation shapes, enabling semantic correctness without manual API specifications. Evaluation reports consistent coverage growth across targets (with Lua benefiting most from metamethod dispatch) and corpus analysis showing reconstruction of inputs that match known vulnerability patterns, although no novel vulnerabilities were discovered during the bounded evaluation period.

Significance. If the two-phase approach with passive reflection successfully models override hooks, dynamic rebinding, and attribute resolution to reach script-native boundary behaviors, it would advance automated testing of widely deployed script runtimes for use-after-free and type-confusion issues. The reported reconstruction of known vulnerability patterns provides concrete evidence that generation reaches intended semantic boundaries, and the absence of free parameters or fitted constants in the core mechanism is a strength; however, the lack of new discoveries and limited analysis of pruning effects reduce the demonstrated security impact.

major comments (2)

[Abstract] Abstract: the central claim that passive reflection removes only invalid shapes while preserving all operation shapes capable of triggering boundary bugs is not supported by any reported analysis of pruned shapes or ablation that re-inserts them to measure lost coverage of boundary behaviors; this assumption is load-bearing for the reachability argument.
[Evaluation] Evaluation (implied in abstract description of bounded runs): the corpus analysis reconstructs inputs matching known patterns, but the short bounded evaluation found no new vulnerabilities and provides no quantitative detail on how error-based pruning avoids false negatives on shapes that would trigger use-after-free or type-confusion when overriding methods are active.

minor comments (1)

The description of coverage growth as 'rapid early expansion followed by slower incremental gains' would be strengthened by explicit reference to the time-series data or plots that support this characterization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on OverrideFuzz. We address each major comment below, clarifying the design rationale for passive reflection and the scope of our evaluation claims. We commit to revisions that strengthen the presentation without overstating results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that passive reflection removes only invalid shapes while preserving all operation shapes capable of triggering boundary bugs is not supported by any reported analysis of pruned shapes or ablation that re-inserts them to measure lost coverage of boundary behaviors; this assumption is load-bearing for the reachability argument.

Authors: We agree that an explicit analysis of pruned shapes and an ablation re-inserting them would provide stronger support for the reachability claim. Passive reflection prunes solely on the basis of runtime error messages that signal early invalidity (e.g., type or attribute errors) before native dispatch occurs. By construction, any shape that successfully invokes an overriding method and reaches a use-after-free or type-confusion boundary would not emit such an error and therefore would not be pruned. We did not perform the ablation because of the prohibitive cost of re-running full campaigns with unpruned shape sets. We will revise the abstract and add a limitations paragraph that states this assumption explicitly. revision: partial
Referee: [Evaluation] Evaluation (implied in abstract description of bounded runs): the corpus analysis reconstructs inputs matching known patterns, but the short bounded evaluation found no new vulnerabilities and provides no quantitative detail on how error-based pruning avoids false negatives on shapes that would trigger use-after-free or type-confusion when overriding methods are active.

Authors: The manuscript already states that no new vulnerabilities were found in the bounded window. The corpus reconstruction of known vulnerability patterns supplies concrete evidence that the two-phase generation reaches the intended semantic boundaries. Error-based pruning cannot produce false negatives on bug-triggering shapes because those shapes execute the override hooks without generating the pruning errors. We lack quantitative false-negative measurements because constructing an oracle for all possible boundary-triggering shapes is infeasible. We will expand the evaluation section with further description of the pruning process, its effect on corpus validity rates, and coverage curves with and without passive reflection where feasible. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical observations only

full rationale

The paper presents OverrideFuzz as a two-phase grammar fuzzer whose claims rest on experimental coverage measurements and corpus reconstruction of known vulnerability patterns. No equations, fitted parameters, or derivations are introduced that reduce to the inputs by construction. Passive reflection is described as a heuristic for pruning invalid shapes, but this is presented as an engineering choice whose effectiveness is evaluated empirically rather than proven via self-referential logic or self-citation chains. The central suggestion that semantic-aware generation reaches boundary behaviors is tied directly to observable corpus matches, not to any renamed known result or ansatz smuggled through prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on domain assumptions about runtime reflection rather than new entities or fitted parameters; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Active and passive reflection on runtime types and error messages can approach semantic correctness without manual API specifications.
This underpins both the declaration and execution phases and the claim that generation reaches script-native boundaries.

pith-pipeline@v0.9.0 · 5522 in / 1275 out tokens · 45472 ms · 2026-05-14T20:56:07.303541+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Integrating formal methods and automated tools for DO-178C compliance in UA V software,

R. Zrelli et al., “Integrating formal methods and automated tools for DO-178C compliance in UA V software,” Information and Software Technology, vol. 194, p. 108068, 2026, doi: https://doi.org/10.1016/j.infsof.2026.108068

work page doi:10.1016/j.infsof.2026.108068 2026
[2]

Clang Static Analyzer

LLVM Project, “Clang Static Analyzer.” Accessed: Apr. 20, 2026. [Online]. Available: https://clang-analyzer.llvm.org/

work page 2026
[3]

SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis,

Y. Shoshitaishvili et al., “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis,” in 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 138–157. doi: 10.1109/SP.2016.17

work page doi:10.1109/sp.2016.17 2016
[4]

FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities,

S. Groß, S. Koch, L. Bernhard, T. Holz, and M. Johns, “FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities,” in Proceedings 2023 Network and Distributed System Security Symposium, San Diego, CA, USA: Internet Society, 2023. doi: 10.14722/ndss.2023.24290

work page doi:10.14722/ndss.2023.24290 2023
[5]

NAUTILUS: Fishing for Deep Bugs with Grammars,

C. Aschermann, T. Holz, P. Jauernig, A.-R. Sadeghi, and D. Teuchert, “NAUTILUS: Fishing for Deep Bugs with Grammars,” in Proceedings 2019 Network and Distributed System Secu- rity Symposium, San Diego, CA: Internet Society, 2019. doi: 10.14722/ndss.2019.23412

work page doi:10.14722/ndss.2019.23412 2019
[6]

One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,

Y. Chen et al., “One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,” in 2021 IEEE Symposium on Security and Privacy (SP) , 2021, pp. 642–658. doi: 10.1109/SP40001.2021.00071

work page doi:10.1109/sp40001.2021.00071 2021
[7]

PatchFuzz: Patch Fuzzing for JavaScript Engines,

J. Wang, Z. Xie, X. Xie, X. Du, and X. Zhang, “PatchFuzz: Patch Fuzzing for JavaScript Engines,” Information and Software Technology , vol. 194, p. 108087, June 2026, doi: 10.1016/j.infsof.2026.108087

work page doi:10.1016/j.infsof.2026.108087 2026
[8]

REFLECTA: Reflection-based Scalable and Semantic Scripting Language Fuzzing,

C. Zhang, G. Lee, Q. Liu, and M. Payer, “REFLECTA: Reflection-based Scalable and Semantic Scripting Language Fuzzing,” in Proceedings ASIA CCS '25 , Hanoi, Vietnam,

work page
[9]

doi: 10.1145/3708821.3710818

work page doi:10.1145/3708821.3710818
[10]

AFL++ : Combining Incremental Steps of Fuzzing Research,

A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” in 14th USENIX Workshop on Offensive Technologies (WOOT 29 20), USENIX Association, Aug. 2020. [Online]. Available: https://www.usenix.org/ conference/woot20/presentation/fioraldi

work page 2020
[11]

Superion: Grammar-Aware Greybox Fuzzing,

J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-Aware Greybox Fuzzing,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) , 2019, pp. 724–735. doi: 10.1109/ICSE.2019.00081

work page doi:10.1109/icse.2019.00081 2019
[12]

Python interpreter fuzzing using AST-base mutators, based on LibFuzzer

Y. Qiu, “Python interpreter fuzzing using AST-base mutators, based on LibFuzzer.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/Nambers/CPython-AST- Fuzzer

work page 2026
[13]

UAF when writing to a bytearray with an element implementing __index__ with side-ef- fects

“UAF when writing to a bytearray with an element implementing __index__ with side-ef- fects.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/ issues/91153

work page 2026
[14]

There is a way to access an underlying mapping in MappingProxyType

“There is a way to access an underlying mapping in MappingProxyType.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/issues/88004

work page 2026
[15]

The Zephyr abstract syntax description language,

D. C. Wang, A. W. Appel, J. L. Korn, and C. S. Serra, “The Zephyr abstract syntax description language,” in Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997 , in DSL'97. Santa Barbara, California: USENIX Association, 1997, p. 17

work page 1997
[16]

SanitizerCoverage

LLVM Project, “SanitizerCoverage.” Accessed: Apr. 20, 2026. [Online]. Available: https:// clang.llvm.org/docs/SanitizerCoverage.html

work page 2026
[17]

Nix & NixOS | Declarative builds and deployments

NixOS contributors, “Nix & NixOS | Declarative builds and deployments..” Accessed: Apr. 20, 2026. [Online]. Available: https://nixos.org/

work page 2026
[18]

release 3.14.3 python/cpython

Python Software Foundation, “release 3.14.3 python/cpython.” Accessed: Apr. 20, 2026. [Online]. Available: https://github.com/python/cpython/releases/tag/v3.14.3

work page 2026
[19]

Lua: version history

Lua Team, “Lua: version history.” Accessed: Apr. 20, 2026. [Online]. Available: https:// www.lua.org/versions.html#5.5

work page 2026
[20]

QuickJS binary releases

F. Bellard, “QuickJS binary releases.” Accessed: Apr. 20, 2026. [Online]. Available: https:// bellard.org/quickjs/binary_releases/ 30

work page 2026
[21]

ECMAScript 2023 Language Specification

Ecma International, “ECMAScript 2023 Language Specification.” Accessed: Apr. 20, 2026. [Online]. Available: https://tc39.es/ecma262/2023/

work page 2023
[22]

Evaluating and mitigating the growing risk of LLM-discovered 0-days

N. Carlini et al., “Evaluating and mitigating the growing risk of LLM-discovered 0-days.” Accessed: Apr. 20, 2026. [Online]. Available: https://red.anthropic.com/2026/zero-days/

work page 2026
[23]

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,

C. Zhang et al., “SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,” no. arXiv:2602.07666. arXiv, Feb. 2026. doi: 10.48550/ arXiv.2602.07666. 31 APPENDIX A Override Functions Reference The table below shows representative examples of override functions discovered by OverrideFuzz’s active reflection phase fo...

work page arXiv 2026