arxiv: 2604.19824 · v1 · submitted 2026-04-20 · 💻 cs.SE

Recognition: unknown

Stateful Embedded Fuzzing with Peripheral-Accurate SystemC Virtual Prototypes

Chiara Ghinami , Igor Pontes Tresolavy , Luis Seibt , Nils Bosbach , Rainer Leupers

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:46 UTC · model grok-4.3

classification 💻 cs.SE

keywords embedded fuzzingSystemC-TLMvirtual prototypesAFL++peripheral modelingpre-silicon testingstateful simulationembedded software

0 comments

The pith

Stateful SystemC-TLM virtual prototypes integrated with AFL++ enable realistic embedded software fuzzing that eliminates false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that links the AFL++ fuzzer to a full-system SystemC-TLM simulation so that generated inputs reach peripheral models directly. This allows peripherals to produce authentic responses such as interrupts and FIFO updates during execution. Existing fuzzing methods either simplify peripherals too much, creating misleading results, or demand manual setup that limits scale. The new integration aims to deliver accurate pre-silicon testing for embedded code while keeping the speed and coverage of prior tools. Experiments on embedded workloads confirm fewer false positives with no measurable loss in code coverage or runtime performance.

Core claim

By injecting fuzzer-generated inputs directly into peripheral models inside a stateful SystemC-TLM virtual prototype, the framework lets peripherals trigger natural side effects such as interrupts and FIFO updates. This full-system simulation approach supports fuzzing of realistic embedded software without the accuracy loss of fast user-mode simulators or the manual instrumentation burden of traditional full-system tools.

What carries the argument

The stateful SystemC-TLM virtual prototype, which models peripheral state transitions so that fuzzer inputs produce authentic hardware-like side effects inside the simulation.

If this is right

Pre-silicon testing of embedded software can proceed at larger scale with realistic peripheral interactions.
Fuzzing can be applied to full embedded systems without sacrificing peripheral accuracy or requiring heavy manual instrumentation.
False positives arising from simplified peripheral models are removed while code coverage and execution speed remain comparable to current tools.
The method supports testing before hardware fabrication by keeping simulation fidelity high enough to reflect real peripheral state.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If peripheral state accuracy drives the reduction in false positives, then refinements to virtual prototype timing models could further improve detection reliability.
The same injection pattern might transfer to other coverage-guided fuzzers or different simulation back-ends beyond SystemC.
This style of virtual-prototype fuzzing could support automated regression testing in embedded development pipelines by running in software-only environments.

Load-bearing premise

The SystemC-TLM virtual prototypes must accurately capture peripheral behaviors and state transitions without introducing simulation artifacts that would mask or create false issues.

What would settle it

A direct comparison of crash reports and coverage metrics produced by the framework against the same workload run on physical embedded hardware, checking whether reported issues match and false positives disappear.

Figures

Figures reproduced from arXiv: 2604.19824 by Chiara Ghinami, Igor Pontes Tresolavy, Luis Seibt, Nils Bosbach, Rainer Leupers.

**Figure 1.** Figure 1: Simulation-based fuzz testing. AFL++ [8], a community-maintained fork of the AFL fuzzer [23], is widely used in industry and research for coverage-guided testing. Recently, significant research has focused on enabling embedded software fuzzing by integrating AFL++ with simulators. As shown in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: The steps of the fuzzing workflow. 3 Fuzzing Framework In this section, we present our framework and the adaptations required in the VP so that it can efficiently work with the fuzzer. We then describe how the framework employs injector modules to send fuzz data to peripheral models. 3.1 Workflow Because AFL++ already supports QEMU-based fuzzing, no changes to the fuzzer were needed to add a new VP. In con… view at source ↗

**Figure 4.** Figure 4: No specific fuzzing seeds were given, each run began with a [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Code coverage comparison for the various tools. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of the execution/second of the three [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

The increasing complexity of embedded software has made comprehensive manual testing impractical, motivating the use of automated techniques such as fuzzing. Coverage-guided fuzzers like AFL++ have shown strong results for conventional software but remain challenging to apply effectively in embedded contexts, where peripheral behaviors play critical roles. Existing approaches either use fast user-mode simulators, sacrificing peripheral realism, or rely on full-system simulators with manual instrumentation, limiting applicability to large-scale software. In this work, we present a novel framework that integrates AFL++ with a stateful SystemC-TLM virtual prototype to enable realistic fuzzing of embedded software. Fuzzer-generated inputs are injected directly into peripheral models, allowing peripherals to trigger natural side effects such as interrupts and FIFO updates. By integrating fuzzing with full-system simulation, our framework advances the effectiveness of pre-silicon testing for embedded systems. Results on embedded workloads show that our approach eliminates false positives while maintaining comparable code coverage and execution performance as state-of-the-art tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links AFL++ directly to stateful SystemC-TLM peripheral models so fuzz inputs trigger real interrupts and state changes, but the results section gives almost no experimental detail to support the no-false-positives claim.

read the letter

The main point is a framework that feeds AFL++ inputs straight into stateful SystemC-TLM virtual prototypes instead of user-mode abstractions or heavily instrumented full-system simulators. Peripherals then produce their own side effects like interrupts and FIFO updates during the fuzz run. That combination is presented as new and it targets a real gap in embedded testing where either realism or scale gets sacrificed. The paper explains the setup clearly and shows why direct injection into TLM models should cut simulator-induced false positives while keeping the fuzzing loop fast enough for coverage-guided work. That part is useful and practical for anyone doing pre-silicon validation on SoCs with complex peripherals. The approach also avoids the manual instrumentation burden that limits other full-system methods. On the downside the abstract and summary only assert that false positives disappear and that coverage plus runtime stay comparable to prior tools. No workload list, no count of how false positives were identified or measured, and no numbers or statistical checks appear in the visible material. That leaves the central empirical claim without visible support. The assumption that the TLM models are accurate enough not to mask or create their own issues is stated but not tested in the provided text. This work is for embedded-systems researchers and verification engineers who already use virtual prototypes or want to apply coverage-guided fuzzing to hardware-software interfaces. A reader looking for concrete tooling ideas in that niche would get something usable from the framework description. I would send it to peer review. The idea is straightforward, the motivation is solid, and the technical direction is worth referee scrutiny even if the current evidence for the performance claims is thin.

Referee Report

2 major / 2 minor

Summary. The paper presents a framework that integrates AFL++ with stateful SystemC-TLM virtual prototypes for fuzzing embedded software. Fuzzer inputs are injected directly into peripheral models to produce realistic side effects such as interrupts and FIFO updates, claiming to eliminate false positives from user-mode abstractions while achieving comparable code coverage and execution performance to state-of-the-art tools on embedded workloads.

Significance. If the empirical claims hold under rigorous validation, the work could meaningfully advance pre-silicon testing for embedded systems by enabling peripheral-accurate fuzzing without the typical trade-offs between simulation speed and realism.

major comments (2)

Experimental Evaluation section: the central claim that the approach 'eliminates false positives' is asserted without any description of the methodology used to identify, count, or classify false positives, the specific workloads chosen, or statistical analysis of results; this leaves the primary contribution unsupported by visible evidence.
Framework Integration section: the description of direct input injection into SystemC-TLM peripheral models and maintenance of state across fuzzing iterations lacks concrete details on implementation (e.g., how interrupts are triggered or how simulation state is reset between test cases), making it impossible to assess whether the claimed realism is achieved without introducing new artifacts.

minor comments (2)

Abstract: the performance and coverage comparison is stated as 'comparable' but provides no quantitative metrics or baselines, which should be summarized even at a high level.
The paper would benefit from a dedicated threats-to-validity subsection addressing the fidelity of the SystemC-TLM models used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened. We address each major comment below and will revise the paper to provide the requested details and clarifications.

read point-by-point responses

Referee: Experimental Evaluation section: the central claim that the approach 'eliminates false positives' is asserted without any description of the methodology used to identify, count, or classify false positives, the specific workloads chosen, or statistical analysis of results; this leaves the primary contribution unsupported by visible evidence.

Authors: We agree that the Experimental Evaluation section would benefit from an explicit description of the false-positive identification methodology. In the manuscript, false positives are characterized as crashes or anomalous behaviors observed under user-mode abstractions that do not occur on real hardware due to missing peripheral state; our results demonstrate zero such cases for the proposed approach across the evaluated embedded workloads while baselines exhibit them. To address the concern, we will add a dedicated subsection in the revised Experimental Evaluation that defines the classification criteria, lists the specific workloads (including benchmark names and sizes), and includes basic statistical reporting on the observed differences. This will make the supporting evidence fully transparent. revision: yes
Referee: Framework Integration section: the description of direct input injection into SystemC-TLM peripheral models and maintenance of state across fuzzing iterations lacks concrete details on implementation (e.g., how interrupts are triggered or how simulation state is reset between test cases), making it impossible to assess whether the claimed realism is achieved without introducing new artifacts.

Authors: We acknowledge that additional implementation specifics are required for reproducibility and to confirm that no new artifacts are introduced. The current description focuses on the high-level architecture; in the revision we will expand the Framework Integration section with concrete mechanisms, such as updating peripheral registers and raising TLM interrupt notifications upon input injection, and using SystemC checkpoint/restore facilities to reset simulation state between iterations while preserving only the necessary peripheral context. These additions will allow readers to evaluate the realism of the side effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a framework for integrating AFL++ fuzzing with stateful SystemC-TLM virtual prototypes, where inputs are injected into peripheral models to produce realistic side effects like interrupts. Central claims of eliminating false positives while preserving coverage and performance rest on empirical comparisons to prior tools, not on any derivation that reduces to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or uniqueness theorems are invoked in the abstract or summary that presuppose the result; the logic follows directly from avoiding user-mode abstractions when models are accurate, with model fidelity treated as an external assumption rather than an internal tautology. The work is self-contained via experimental validation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the work appears to rely on standard SystemC-TLM modeling assumptions and AFL++ usage.

pith-pipeline@v0.9.0 · 5477 in / 1060 out tokens · 35136 ms · 2026-05-10T04:46:02.448510+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 5 canonical work pages

[1]

Fabrice Bellard. 2005. Qemu, a fast and portable dynamic translator. InProceed- ings of the Annual Conference on USENIX Annual Technical Conference(ATEC ’05). USENIX Association, Anaheim, CA, 41

2005
[2]

Marcel Böhme, Valentin J. M. Manès, and Sang Kil Cha. 2023. Boosting fuzzer efficiency: an information theoretic perspective.Commun. ACM, 66, 11, (Oct. 2023), 89–97. doi:10.1145/3611019

work page doi:10.1145/3611019 2023
[3]

Bosch. [n. d.] MCAN User Manual. (). https://www.bosch-semiconductors.com /media/ip_modules/pdf_2/m_can/mcan_users_manual_v331.pdf
[4]

Peng Chen and Hao Chen. 2018. Angora: efficient fuzzing by principled search. In2018 IEEE Symposium on Security and Privacy (SP). IEEE, 711–725

2018
[5]

Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer

Abraham A. Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer
[6]

InProceedings of the 29th USENIX Conference on Security Symposium(SEC’20) Article 68

Halucinator: firmware re-hosting through abstraction layer emulation. InProceedings of the 29th USENIX Conference on Security Symposium(SEC’20) Article 68. USENIX Association, USA, 18 pages.isbn: 978-1-939133-17-5
[7]

Bo Feng. 2020. P2im github page. (2020). https://github.com/RiS3-Lab/p2im

2020
[8]

Bo Feng, Alejandro Mera, and Long Lu. 2020. {P2im}: scalable and hardware- independent firmware testing via automatic peripheral interface modeling. In 29th USENIX Security Symposium (USENIX Security 20), 1237–1254

2020
[9]

2020.{Afl++}: combining incremental steps of fuzzing research

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020.{Afl++}: combining incremental steps of fuzzing research. In14th USENIX Workshop on Offensive Technologies (WOOT 20)

2020
[10]

Andrea Fioraldi, Dominik Christian Maier, Dongjia Zhang, and Davide Balzarotti
[11]

InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 1051–1065

Libafl: a framework to build modular and reusable fuzzers. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 1051–1065

2022
[12]

Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. InNdss. Vol. 8, 151–166

2008
[13]

Vladimir Herdt, Daniel Große, Jonas Wloka, Tim Güneysu, and Rolf Drechsler
[14]

InProceedings of the 2020 on Great Lakes Symposium on VLSI(GLSVLSI ’20)

Verification of embedded binaries using coverage-guided fuzzing with systemc-based virtual prototypes. InProceedings of the 2020 on Great Lakes Symposium on VLSI(GLSVLSI ’20). Association for Computing Machinery, Virtual Event, China, 101–106.isbn: 9781450379441. doi:10.1145/3386263.3406 899

work page doi:10.1145/3386263.3406 2020
[15]

Doug Jacobson. 2023. Car thieves can hack into today’s computerized vehicles. (2023). https://www.scientificamerican.com/article/to-steal-todays-compute rized-cars-thieves-go-high-tech

2023
[16]

MachineWare. 2025. Machineware website. (2025). https://www.machineware .de/

2025
[17]

MachineWare. [n. d.] VCML. (). https://github.com/machineware-gmbh/vcml
[18]

Sanoop Mallissery and Yu-Sung Wu. 2023. Demystify the fuzzing methods: a comprehensive survey.ACM Comput. Surv., 56, 3, Article 71, (Oct. 2023), 38 pages. doi:10.1145/3623375

work page doi:10.1145/3623375 2023
[19]

Valentin JM Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J Schwartz, and Maverick Woo. 2019. The art, science, and engineering of fuzzing: a survey.IEEE Transactions on Software Engineering, 47, 11, 2312–2331

2019
[20]

Packetlabs. 2023. The dark art of uart hacking. (2023). https://www.packetlabs .net/posts/the-dark-art-of-uart-hacking

2023
[21]

Tobias Scharnowski, Nils Bars, Moritz Schloegel, Eric Gustafson, Marius Muench, Giovanni Vigna, Christopher Kruegel, Thorsten Holz, and Ali Abbasi. 2022. Fuzzware: using precise {mmio} modeling for effective firmware fuzzing. In 31st USENIX Security Symposium (USENIX Security 22), 1239–1256

2022
[22]

Nordic Semiconductor. [n. d.] Nrfx drivers. (). https://github.com/NordicSemic onductor/nrfx
[23]

SystemC. 2025. Systemc website. (2025). https://systemc.org/

2025
[24]

Ken Tindell

Dr. Ken Tindell. 2023. The can injection attack. (2023). https://www.can-cia.or g/fileadmin/cia/documents/publications/cnlm/june_2023/cnlm_23-2_p20_th e_can_injection_attack_ken_tindel_canis_automotive_labs.pdf

2023
[25]

Zhenkun Yang, Yuriy Viktorov, Jin Yang, Jiewen Yao, and Vincent Zimmer
[26]

In2020 57th ACM/IEEE Design Automation Conference (DAC), 1–6

Uefi firmware fuzzing with simics virtual platform. In2020 57th ACM/IEEE Design Automation Conference (DAC), 1–6. doi:10.1109/DAC18072.2020.921869 4

work page doi:10.1109/dac18072.2020.921869 2020
[27]

Michal Zalewski. 2025. American fuzzy loop website. (2025). https://lcamtuf.co redump.cx/afl/

2025
[28]

Zephyr. 2025. Babbling zephyr example. (2025). https://github.com/zephyrproj ect-rtos/zephyr/tree/main/samples/drivers/can/babbling

2025
[29]

Zephyr. 2025. Passthrough zephyr example. (2025). https://github.com/zephyr project-rtos/zephyr/tree/main/samples/drivers/uart/passthrough

2025
[30]

Zephyr Project. 2024. Zephyr RTOS. https://www.zephyrproject.org/. Accessed: 2025-11-03. (2024)

2024
[31]

Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and Limin Sun. 2019. Firm-afl: high-throughput greybox fuzzing of iot firmware via augmented process emulation. InProceedings of the 28th USENIX Conference on Security Symposium(SEC’19). USENIX Association, Santa Clara, CA, USA, 1099–1114.isbn: 9781939133069

2019
[32]

Xiaogang Zhu, Sheng Wen, Seyit Camtepe, and Yang Xiang. 2022. Fuzzing: a survey for roadmap.ACM Comput. Surv., 54, 11s, Article 230, (Sept. 2022), 36 pages. doi:10.1145/3512345

work page doi:10.1145/3512345 2022