arxiv: 2604.13693 · v1 · submitted 2026-04-15 · 💻 cs.SE

Recognition: unknown

Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference

Ruiying Zeng , Shuyao Jiang , Wenxuan Zhao , Yangfan Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:05 UTC · model grok-4.3

classification 💻 cs.SE

keywords WebAssemblyperformance debuggingmutation testingruntime optimizationcompiler defectsWasmtimemachine code comparison

0 comments

The pith

WarpL locates exact suboptimal machine instructions causing slowdowns in WebAssembly runtimes by mutating programs to remove the issue and comparing the emitted code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Performance problems in WebAssembly runtimes often trace back to how the runtime compiles the input program into machine instructions rather than to the program itself. Existing debugging methods aimed at application code do not isolate these compiler-level choices. WarpL creates a mutant version of the program that behaves the same but runs without the slowdown, then compares the machine code produced for both versions to identify the precise instruction sequences responsible. The paper shows this succeeded on 10 of 12 real issues across three runtimes and surfaced six previously unknown problems in Wasmtime. A reader would care because it converts opaque runtime slowness into concrete, fixable compiler defects.

Core claim

The paper claims that many performance issues in WebAssembly runtimes arise from specific suboptimal instruction sequences generated during compilation of the input program. These sequences can be isolated by first obtaining a functionally equivalent mutant program in which the performance problem does not appear, then directly comparing the machine code emitted for the original and the mutant to reveal the differing instructions that produce the slowdown.

What carries the argument

Mutation-based inference: the process of generating a functionally similar program variant that lacks the performance issue, followed by differencing of the two resulting machine-code sequences to isolate the suboptimal instructions.

If this is right

Runtime developers gain a narrower, actionable set of instructions to inspect when fixing compilation-related slowdowns.
The same technique can surface previously unknown issues inside production runtimes such as Wasmtime.
The method applies across multiple independent WebAssembly runtimes without requiring changes to their internal code.
Debugging effort shifts from broad profiling to targeted comparison of two small code differences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the mutation step can be made fully automatic and fast, the approach could be embedded in nightly compiler test suites to catch regressions early.
The same differencing idea might extend to other virtual-machine languages where performance depends on ahead-of-time or just-in-time code generation.
Repeated application on a family of similar programs could reveal recurring patterns in suboptimal code generation that compiler writers could then prevent.
When the identified instructions involve specific optimizations, the findings could guide targeted improvements in the code-generation passes of the runtimes.

Load-bearing premise

A functionally similar mutant program without the performance issue can be reliably obtained, and that the differences in the resulting machine code will directly and completely identify the root-cause suboptimal instructions.

What would settle it

A documented performance issue in one of the tested runtimes for which no mutant can be generated that removes the slowdown, or for which the machine-code differences fail to match the actual root cause confirmed by manual inspection of the compiler.

Figures

Figures reproduced from arXiv: 2604.13693 by Ruiying Zeng, Shuyao Jiang, Wenxuan Zhao, Yangfan Zhou.

**Figure 2.** Figure 2: An example Wasm program and the workflow of its Execution on Wasm Runtime. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The workflow of WarpL [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Code related to Issue #8573. nearly identical. This prompted developers to notice a subtle yet critical difference: the starting address of func2. In the original program, func2 begins at address 0x130, which is 16-byte aligned. However, modern CPU frontends typically fetch 32-byte or 64-byte aligned chunks from the instruction cache. When a function starts mid-cache-line, additional fetches may be require… view at source ↗

**Figure 4.** Figure 4: Code related to Issue #7085. The left side shows [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 8.** Figure 8: Code related to Issue #7246. The highlighted con [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 7.** Figure 7: Code related to Issue #9590. the divisor is 0, then the runtime should raise an error. Although the divisor is a known constant (1), existing side-effect analysis in Wasmtime does not consider constant values, and thus fails to eliminate the division. Without WarpL, identifying the specific optimization missed by Wasmtime would be considerably more time-consuming, especially since the WasmEdge-compiled x86… view at source ↗

read the original abstract

Performance debugging in WebAssembly (Wasm) runtimes is essential for ensuring the robustness of Wasm, especially since performance issues have frequently occurred in Wasm runtimes, which can significantly degrade the capabilities of hosted services. Many performance issues in Wasm runtimes result from suboptimal compilation of input Wasm programs, for which existing performance debugging methods primarily designed for application-level inefficiencies are not well-suited. In this paper, we present WarpL, a novel mutation-based approach that aims to identify the exact suboptimal instruction sequences responsible for the performance issues in Wasm runtimes, thereby narrowing down the root causes. Specifically, WarpL obtains a functionally similar mutant in which the performance issue does not manifest, and isolates the exact suboptimal instructions by comparing the machine code of the original and mutated programs. We implement WarpL as an open-source tool and evaluate it on 12 real-world performance issues across three widely used Wasm runtimes. WarpL identified the exact causes in 10 out of 12 issues. Notably, we have used WarpL to successfully diagnose six previously unknown performance issues in Wasmtime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents WarpL, a mutation-based approach for debugging performance issues in WebAssembly runtimes. It generates a functionally similar mutant program without the performance issue and isolates the exact suboptimal instruction sequences by comparing the machine code produced for the original and mutant programs. The tool is implemented as open-source software and evaluated on 12 real-world performance issues across three Wasm runtimes, identifying the causes in 10 cases including six previously unknown issues in Wasmtime.

Significance. If the isolation procedure can be shown to reliably attribute slowdowns to specific instructions without confounding compiler side-effects, WarpL would provide a practical advance for diagnosing compilation-related performance problems in Wasm runtimes that are not well-served by existing application-level debuggers. The open-source release, success on real issues, and discovery of new bugs in a production runtime constitute concrete strengths that would support adoption and further research in runtime systems and software engineering.

major comments (2)

[Abstract (method description)] The core claim in the abstract that machine-code comparison 'isolates the exact suboptimal instructions' is load-bearing for the contribution but under-supported: the described procedure provides no mechanism to distinguish root-cause differences from side-effect changes in compiler decisions (e.g., altered inlining thresholds, register pressure, or loop unrolling heuristics) that a mutation can trigger even when functional equivalence holds on the test suite.
[Evaluation] Evaluation on the 12 issues reports success in 10/12 cases but supplies no details on mutant generation strategy, verification that functional equivalence is preserved beyond the test suite, or controls that would confirm the identified instructions are necessary and sufficient for the observed slowdown.

minor comments (1)

[Abstract] The abstract could briefly note the key assumptions of the mutation approach to help readers assess the scope of the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of our claims and evaluation that require clarification and expansion. We address each major comment below and will revise the manuscript to incorporate additional details, caveats, and supporting information while preserving the core contribution.

read point-by-point responses

Referee: The core claim in the abstract that machine-code comparison 'isolates the exact suboptimal instructions' is load-bearing for the contribution but under-supported: the described procedure provides no mechanism to distinguish root-cause differences from side-effect changes in compiler decisions (e.g., altered inlining thresholds, register pressure, or loop unrolling heuristics) that a mutation can trigger even when functional equivalence holds on the test suite.

Authors: We agree that the mutation-based comparison does not include an explicit mechanism to rule out all possible compiler side-effects, and that functional equivalence on the test suite alone does not guarantee identical optimization decisions. Our defense rests on the empirical observation that, across the 12 real-world cases, the identified instruction differences consistently corresponded to the performance issues (including six previously unknown bugs in Wasmtime that were subsequently confirmed and fixed by developers). In the revised manuscript we will (1) tone down the abstract claim to 'helps isolate the primary suboptimal instruction sequences' and (2) add a dedicated limitations subsection discussing potential side-effects and the conditions under which the approach is most reliable. We will also describe the mutation operators chosen to minimize disruptive changes to control flow and inlining. revision: yes
Referee: Evaluation on the 12 issues reports success in 10/12 cases but supplies no details on mutant generation strategy, verification that functional equivalence is preserved beyond the test suite, or controls that would confirm the identified instructions are necessary and sufficient for the observed slowdown.

Authors: We acknowledge that the current evaluation section is concise and omits several methodological details. In the revision we will expand it with: (a) a precise description of the mutant-generation strategy, including the set of mutation operators, search procedure, and termination criteria; (b) the exact verification steps performed beyond the original test suite (including additional fuzzing and differential execution checks where feasible); and (c) per-case arguments, supported by manual inspection and performance measurements, showing why the differing instructions are necessary and sufficient for the slowdown in the ten successful cases. For the two unsuccessful cases we will add an explicit discussion of why the approach did not isolate a root cause. revision: yes

Circularity Check

0 steps flagged

No significant circularity; procedural tool evaluated externally

full rationale

The paper describes WarpL as a mutation-based procedural debugging tool that obtains functionally similar mutants and isolates suboptimal instructions via machine-code comparison. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claim (identifying exact causes in 10/12 real-world issues) is supported by evaluation against external, independently verifiable performance problems in Wasmtime and other runtimes rather than by reducing to the method's own outputs or self-citations. The approach does not rename known results, smuggle ansatzes, or rely on load-bearing self-citations; it remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are stated; the approach rests on the domain assumption that performance issues are localized to specific instruction sequences and that mutants preserve semantics while altering performance.

pith-pipeline@v0.9.0 · 5496 in / 1055 out tokens · 31787 ms · 2026-05-10T13:05:31.827764+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 18 canonical work pages

[1]

2025. aarch64. https://en.wikipedia.org/wiki/AArch64

2025
[2]

Binaryen

2025. Binaryen. https://github.com/WebAssembly/binaryen

2025
[3]

Cranelift

2025. Cranelift. https://cranelift.dev/

2025
[4]

2025. FireFox. https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/ MacroAssembler-x86-shared.h#188-194

2025
[5]

Intel VTune Profiler

2025. Intel VTune Profiler. https://www.intel.com/content/www/us/en/ developer/tools/oneapi/vtune-profiler.html#gs.mt93wr

2025
[6]

Just-in-time compilation

2025. Just-in-time compilation. https://en.wikipedia.org/wiki/Just-in-time_ compilation

2025
[7]

2025. LLVM. https://llvm.org/

2025
[8]

Longest common subsequence

2025. Longest common subsequence. https://en.wikipedia.org/wiki/Longest_ common_subsequence

2025
[9]

perf (Linux)

2025. perf (Linux). https://en.wikipedia.org/wiki/Perf_(Linux)

2025
[10]

Secure & lightweight microservice with a database backend

2025. Secure & lightweight microservice with a database backend. https://github. com/second-state/microservice-rust-mysql

2025
[11]

Stack Machine

2025. Stack Machine. https://en.wikipedia.org/wiki/Stack_machine

2025
[12]

wasm-mutate

2025. wasm-mutate. https://github.com/bytecodealliance/wasm-tools/tree/main/ crates/wasm-mutate

2025
[13]

wasm-reduce

2025. wasm-reduce. https://github.com/WebAssembly/binaryen/wiki/Fuzzing# reducing

2025
[14]

WasmEdge

2025. WasmEdge. https://github.com/WasmEdge/WasmEdge

2025
[15]

2025. Wasmer. https://github.com/wasmerio/wasmer

2025
[16]

Wasmtime

2025. Wasmtime. https://github.com/bytecodealliance/wasmtime

2025
[17]

WebAssembly

2025. WebAssembly. https://webassembly.org/

2025
[18]

WebAssembly Control Instructions

2025. WebAssembly Control Instructions. https://webassembly.github.io/spec/ core/syntax/instructions.html#control-instructions

2025
[19]

WebAssembly-SPEC

2025. WebAssembly-SPEC. https://webassembly.github.io/spec/core/

2025
[20]

2025. x86-64. https://en.wikipedia.org/wiki/X86-64

2025
[21]

Marcos K Aguilera, Jeffrey C Mogul, Janet L Wiener, Patrick Reynolds, and Athicha Muthitacharoen. 2003. Performance debugging for distributed systems of black boxes.ACM SIGOPS Operating Systems Review37, 5 (2003), 74–89

2003
[22]

Marc Andrysco, David Kohlbrenner, Keaton Mowery, Ranjit Jhala, Sorin Lerner, and Hovav Shacham. 2015. On Subnormal Floating Point and Abnormal Timing. In2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015. IEEE Computer Society, 623–639. doi:10.1109/SP.2015.44

work page doi:10.1109/sp.2015.44 2015
[23]

Algirdas Avizienis. 1985. The N-Version Approach to Fault-Tolerant Software. IEEE Trans. Software Eng.11, 12 (1985), 1491–1501. doi:10.1109/TSE.1985.231893

work page doi:10.1109/tse.1985.231893 1985
[24]

Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using Magpie for request extraction and workload modelling.. InOSDI, Vol. 4. 18–18

2004
[25]

Shangtong Cao, Ningyu He, Xinyu She, Yixuan Zhang, Mu Zhang, and Haoyu Wang. 2024. WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Prad...

work page doi:10.1145/3650212.3680358 2024
[26]

Luca Della Toffola, Michael Pradel, and Thomas R Gross. 2015. Performance problems you can fix: A dynamic analysis of memoization opportunities.ACM SIGPLAN Notices50, 10 (2015), 607–622

2015
[27]

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. AFL++: Combining Incremental Steps of Fuzzing Research. In14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association

2020
[28]

Frangoudis, and Schahram Dustdar

Philipp Gackstatter, Pantelis A. Frangoudis, and Schahram Dustdar. 2022. Pushing Serverless to the Edge with WebAssembly Runtimes. In22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022. IEEE, 140–149. doi:10.1109/CCGRID54584.2022.00023

work page doi:10.1109/ccgrid54584.2022.00023 2022
[29]

Phani Kishore Gadepalli, Sean McBride, Gregor Peach, Ludmila Cherkasova, and Gabriel Parmer. 2020. Sledge: a Serverless-first, Light-weight Wasm Runtime for the Edge. InMiddleware ’20: 21st International Middleware Conference, Delft, The Netherlands, December 7-11, 2020, Dilma Da Silva and Rüdiger Kapitza (Eds.). ACM, 265–279. doi:10.1145/3423211.3425680

work page doi:10.1145/3423211.3425680 2020
[30]

Schuff, Ben L

Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and J. F. Bastien. 2017. Bringing the web up to speed with WebAssembly. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Mart...

work page doi:10.1145/3062341.3062363 2017
[31]

Berger, and Arjun Guha

Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha. 2019. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code. In2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 107–120. https://www.usenix.org/conference/atc19/presentation/ jangda

2019
[32]

Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, and Michael R. Lyu. 2023. Revealing Performance Issues in Server-Side WebAssembly Runtimes Via Differential Testing. In38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 661–672. doi:10.1109/ASE56229.2023.00088 ICSE ’2...

work page doi:10.1109/ase56229.2023.00088 2023
[33]

Shuyao Jiang, Ruiying Zeng, Yangfan Zhou, and Michael R. Lyu. 2025. Distinguishability-guided Test Program Generation for WebAssembly Runtime Performance Testing. InProceedings of the 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 768–779

2025
[34]

Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and detecting real-world performance bugs.ACM SIGPLAN Notices47, 6 (2012), 77–88

2012
[35]

Concept Drift Detection from Multi-Class Imbalanced Data Streams , year =

J. Menetrey, M. Pasin, P. Felber, and V. Schiavoni. 2021. Twine: An Embedded Trusted Runtime for WebAssembly. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 205–216. doi:10.1109/ICDE51399.2021.00025

work page doi:10.1109/icde51399.2021.00025 2021
[36]

Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical Delta Debugging. In28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 20-28, 2006, Leon J. Osterweil, H. Dieter Rombach, and Mary Lou Soffa (Eds.). ACM, 142–151. doi:10.1145/1134285.1134307

work page doi:10.1145/1134285.1134307 2006
[37]

Jämes Ménétrey, Marcelo Pasin, Pascal Felber, Valerio Schiavoni, Giovanni Mazzeo, Arne Hollum, and Darshan Vaydia. 2023. A Comprehensive Trusted Runtime for WebAssembly with Intel SGX.IEEE Transactions on Dependable and Secure Computing(2023), 1–18. doi:10.1109/TDSC.2023.3334516

work page doi:10.1109/tdsc.2023.3334516 2023
[38]

Lenin Ravindranath, Jitendra Padhye, Sharad Agarwal, Ratul Mahajan, Ian Ober- miller, and Shahin Shayandeh. 2012. {AppInsight}: Mobile App Performance Monitoring in the Wild. In10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 107–120

2012
[39]

Xiang (Jenny) Ren, Sitao Wang, Zhuqi Jin, David Lion, Adrian Chiu, Tianyin Xu, and Ding Yuan. 2023. Relational Debugging - Pinpointing Root Causes of Performance Problems. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10-12, 2023, Roxana Geambasu and Ed Nightingale (Eds.). USENIX Association, 65–8...

2023
[40]

Pietzuch

Simon Shillaker and Peter R. Pietzuch. 2020. Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing. In2020 USENIX Annual Technical Con- ference, USENIX ATC 2020, July 15-17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 419–433. https://www.usenix.org/conference/atc20/ presentation/shillaker

2020
[41]

Linhai Song and Shan Lu. 2014. Statistical debugging for real-world performance problems. InProceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, Andrew P. Black and Todd D. Millstein (Eds.). ACM, 561–578. doi:10.1145/2...

work page doi:10.1145/2660193.2660234 2014
[42]

Linhai Song and Shan Lu. 2017. Performance diagnosis for inefficient loops. In2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 370–380

2017
[43]

Benedikt Spies and Markus Mock. 2021. An evaluation of webassembly in non- web environments. In2021 XLVII Latin American Computing Conference (CLEI). IEEE, 1–10

2021
[44]

Pengfei Su, Qingsen Wang, Milind Chabbi, and Xu Liu. 2019. Pinpointing perfor- mance inefficiencies in Java. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 818–829

2019
[45]

Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redun- dant loads: A software inefficiency indicator. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 982–993

2019
[46]

Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: syntax-guided program reduction. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 361–371. doi:10.1145/3180155.3180236

work page doi:10.1145/3180155.3180236 2018
[47]

Theodoros Theodoridis, Manuel Rigger, and Zhendong Su. 2022. Finding missed optimizations through the lens of dead code elimination. InASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022, Babak Falsafi, Michael Ferdman, Shan Lu, and T...

work page doi:10.1145/3503222.3507764 2022
[48]

Weihang Wang. 2021. Empowering web applications with webassembly: Are we there yet?. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1301–1305

2021
[49]

Weili Wang, Honghan Ji, Peixuan He, Yao Zhang, Ye Wu, and Yinqian Zhang
[50]

WAVEN: WebAssembly Memory Virtualization for Enclaves. InNDSS
[51]

Lingmei Weng, Peng Huang, Jason Nieh, and Junfeng Yang. 2021. Argus: Debug- ging performance issues in modern desktop applications with annotated causal tracing. In2021 USENIX Annual Technical Conference (USENIX ATC 21). 193–207

2021
[52]

Yutian Yan, Tengfei Tu, Lijian Zhao, Yuchen Zhou, and Weihang Wang. 2021. Understanding the performance of webassembly applications. InProceedings of the 21st ACM Internet Measurement Conference. 533–549

2021
[53]

Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure- Inducing Input.IEEE Trans. Software Eng.28, 2 (2002), 183–200. doi:10.1109/32. 988498

work page doi:10.1109/32 2002
[54]

Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Yu Jiang, and Chengnian Sun
[55]

PPR: Pairwise Program Reduction. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 338–349. doi:10.1145/3611643.3616275

work page doi:10.1145/3611643.3616275 2023
[56]

Yongle Zhang, Kirk Rodrigues, Yu Luo, Michael Stumm, and Ding Yuan. 2019. The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 13...

work page doi:10.1145/3341301.3359650 2019
[57]

Wenxuan Zhao, Ruiying Zeng, and Yangfan Zhou. 2024. Wapplique: Testing WebAssembly Runtime via Execution Context-Aware Bytecode Mutation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Pradel (Eds.). ACM, 1035–1047. doi:10.114...

work page doi:10.1145/3650212.3680340 2024
[58]

Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm. 2016. {Non- Intrusive} performance profiling for entire software stacks based on the flow reconstruction principle. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 603–618

2016
[59]

Yutong Zhao, Lu Xiao, Andre B Bondi, Bihuan Chen, and Yang Liu. 2022. A large- scale empirical study of real-life performance issues in open source projects.IEEE transactions on software engineering49, 2 (2022), 924–946

2022
[60]

Shiyao Zhou, Muhui Jiang, Weimin Chen, Hao Zhou, Haoyu Wang, and Xiapu Luo. 2023. Wadiff: A differential testing framework for webassembly runtimes. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 939–950

2023
[61]

Shiyao Zhou, Jincheng Wang, He Ye, Hao Zhou, Claire Le Goues, and Xiapu Luo. 2025. LWDIFF: An LLM-Assisted Differential Testing Framework for We- bAssembly Runtimes. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 769–769

2025
[62]

Xintong Zhou, Zhenyang Xu, Mengxiao Zhang, Yongqiang Tian, and Chengnian Sun. 2025. WDD: Weighted Delta Debugging . In2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 1592–1603. doi:10.1109/ICSE55347.2025.00071 Received 18 July 2025; accepted 17 October 2025

work page doi:10.1109/icse55347.2025.00071 2025