Recognition: unknown
Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference
Pith reviewed 2026-05-10 13:05 UTC · model grok-4.3
The pith
WarpL locates exact suboptimal machine instructions causing slowdowns in WebAssembly runtimes by mutating programs to remove the issue and comparing the emitted code.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that many performance issues in WebAssembly runtimes arise from specific suboptimal instruction sequences generated during compilation of the input program. These sequences can be isolated by first obtaining a functionally equivalent mutant program in which the performance problem does not appear, then directly comparing the machine code emitted for the original and the mutant to reveal the differing instructions that produce the slowdown.
What carries the argument
Mutation-based inference: the process of generating a functionally similar program variant that lacks the performance issue, followed by differencing of the two resulting machine-code sequences to isolate the suboptimal instructions.
If this is right
- Runtime developers gain a narrower, actionable set of instructions to inspect when fixing compilation-related slowdowns.
- The same technique can surface previously unknown issues inside production runtimes such as Wasmtime.
- The method applies across multiple independent WebAssembly runtimes without requiring changes to their internal code.
- Debugging effort shifts from broad profiling to targeted comparison of two small code differences.
Where Pith is reading between the lines
- If the mutation step can be made fully automatic and fast, the approach could be embedded in nightly compiler test suites to catch regressions early.
- The same differencing idea might extend to other virtual-machine languages where performance depends on ahead-of-time or just-in-time code generation.
- Repeated application on a family of similar programs could reveal recurring patterns in suboptimal code generation that compiler writers could then prevent.
- When the identified instructions involve specific optimizations, the findings could guide targeted improvements in the code-generation passes of the runtimes.
Load-bearing premise
A functionally similar mutant program without the performance issue can be reliably obtained, and that the differences in the resulting machine code will directly and completely identify the root-cause suboptimal instructions.
What would settle it
A documented performance issue in one of the tested runtimes for which no mutant can be generated that removes the slowdown, or for which the machine-code differences fail to match the actual root cause confirmed by manual inspection of the compiler.
Figures
read the original abstract
Performance debugging in WebAssembly (Wasm) runtimes is essential for ensuring the robustness of Wasm, especially since performance issues have frequently occurred in Wasm runtimes, which can significantly degrade the capabilities of hosted services. Many performance issues in Wasm runtimes result from suboptimal compilation of input Wasm programs, for which existing performance debugging methods primarily designed for application-level inefficiencies are not well-suited. In this paper, we present WarpL, a novel mutation-based approach that aims to identify the exact suboptimal instruction sequences responsible for the performance issues in Wasm runtimes, thereby narrowing down the root causes. Specifically, WarpL obtains a functionally similar mutant in which the performance issue does not manifest, and isolates the exact suboptimal instructions by comparing the machine code of the original and mutated programs. We implement WarpL as an open-source tool and evaluate it on 12 real-world performance issues across three widely used Wasm runtimes. WarpL identified the exact causes in 10 out of 12 issues. Notably, we have used WarpL to successfully diagnose six previously unknown performance issues in Wasmtime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents WarpL, a mutation-based approach for debugging performance issues in WebAssembly runtimes. It generates a functionally similar mutant program without the performance issue and isolates the exact suboptimal instruction sequences by comparing the machine code produced for the original and mutant programs. The tool is implemented as open-source software and evaluated on 12 real-world performance issues across three Wasm runtimes, identifying the causes in 10 cases including six previously unknown issues in Wasmtime.
Significance. If the isolation procedure can be shown to reliably attribute slowdowns to specific instructions without confounding compiler side-effects, WarpL would provide a practical advance for diagnosing compilation-related performance problems in Wasm runtimes that are not well-served by existing application-level debuggers. The open-source release, success on real issues, and discovery of new bugs in a production runtime constitute concrete strengths that would support adoption and further research in runtime systems and software engineering.
major comments (2)
- [Abstract (method description)] The core claim in the abstract that machine-code comparison 'isolates the exact suboptimal instructions' is load-bearing for the contribution but under-supported: the described procedure provides no mechanism to distinguish root-cause differences from side-effect changes in compiler decisions (e.g., altered inlining thresholds, register pressure, or loop unrolling heuristics) that a mutation can trigger even when functional equivalence holds on the test suite.
- [Evaluation] Evaluation on the 12 issues reports success in 10/12 cases but supplies no details on mutant generation strategy, verification that functional equivalence is preserved beyond the test suite, or controls that would confirm the identified instructions are necessary and sufficient for the observed slowdown.
minor comments (1)
- [Abstract] The abstract could briefly note the key assumptions of the mutation approach to help readers assess the scope of the claims.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of our claims and evaluation that require clarification and expansion. We address each major comment below and will revise the manuscript to incorporate additional details, caveats, and supporting information while preserving the core contribution.
read point-by-point responses
-
Referee: The core claim in the abstract that machine-code comparison 'isolates the exact suboptimal instructions' is load-bearing for the contribution but under-supported: the described procedure provides no mechanism to distinguish root-cause differences from side-effect changes in compiler decisions (e.g., altered inlining thresholds, register pressure, or loop unrolling heuristics) that a mutation can trigger even when functional equivalence holds on the test suite.
Authors: We agree that the mutation-based comparison does not include an explicit mechanism to rule out all possible compiler side-effects, and that functional equivalence on the test suite alone does not guarantee identical optimization decisions. Our defense rests on the empirical observation that, across the 12 real-world cases, the identified instruction differences consistently corresponded to the performance issues (including six previously unknown bugs in Wasmtime that were subsequently confirmed and fixed by developers). In the revised manuscript we will (1) tone down the abstract claim to 'helps isolate the primary suboptimal instruction sequences' and (2) add a dedicated limitations subsection discussing potential side-effects and the conditions under which the approach is most reliable. We will also describe the mutation operators chosen to minimize disruptive changes to control flow and inlining. revision: yes
-
Referee: Evaluation on the 12 issues reports success in 10/12 cases but supplies no details on mutant generation strategy, verification that functional equivalence is preserved beyond the test suite, or controls that would confirm the identified instructions are necessary and sufficient for the observed slowdown.
Authors: We acknowledge that the current evaluation section is concise and omits several methodological details. In the revision we will expand it with: (a) a precise description of the mutant-generation strategy, including the set of mutation operators, search procedure, and termination criteria; (b) the exact verification steps performed beyond the original test suite (including additional fuzzing and differential execution checks where feasible); and (c) per-case arguments, supported by manual inspection and performance measurements, showing why the differing instructions are necessary and sufficient for the slowdown in the ten successful cases. For the two unsuccessful cases we will add an explicit discussion of why the approach did not isolate a root cause. revision: yes
Circularity Check
No significant circularity; procedural tool evaluated externally
full rationale
The paper describes WarpL as a mutation-based procedural debugging tool that obtains functionally similar mutants and isolates suboptimal instructions via machine-code comparison. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claim (identifying exact causes in 10/12 real-world issues) is supported by evaluation against external, independently verifiable performance problems in Wasmtime and other runtimes rather than by reducing to the method's own outputs or self-citations. The approach does not rename known results, smuggle ansatzes, or rely on load-bearing self-citations; it remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025. aarch64. https://en.wikipedia.org/wiki/AArch64
2025
-
[2]
Binaryen
2025. Binaryen. https://github.com/WebAssembly/binaryen
2025
-
[3]
Cranelift
2025. Cranelift. https://cranelift.dev/
2025
-
[4]
2025. FireFox. https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/ MacroAssembler-x86-shared.h#188-194
2025
-
[5]
Intel VTune Profiler
2025. Intel VTune Profiler. https://www.intel.com/content/www/us/en/ developer/tools/oneapi/vtune-profiler.html#gs.mt93wr
2025
-
[6]
Just-in-time compilation
2025. Just-in-time compilation. https://en.wikipedia.org/wiki/Just-in-time_ compilation
2025
-
[7]
2025. LLVM. https://llvm.org/
2025
-
[8]
Longest common subsequence
2025. Longest common subsequence. https://en.wikipedia.org/wiki/Longest_ common_subsequence
2025
-
[9]
perf (Linux)
2025. perf (Linux). https://en.wikipedia.org/wiki/Perf_(Linux)
2025
-
[10]
Secure & lightweight microservice with a database backend
2025. Secure & lightweight microservice with a database backend. https://github. com/second-state/microservice-rust-mysql
2025
-
[11]
Stack Machine
2025. Stack Machine. https://en.wikipedia.org/wiki/Stack_machine
2025
-
[12]
wasm-mutate
2025. wasm-mutate. https://github.com/bytecodealliance/wasm-tools/tree/main/ crates/wasm-mutate
2025
-
[13]
wasm-reduce
2025. wasm-reduce. https://github.com/WebAssembly/binaryen/wiki/Fuzzing# reducing
2025
-
[14]
WasmEdge
2025. WasmEdge. https://github.com/WasmEdge/WasmEdge
2025
-
[15]
2025. Wasmer. https://github.com/wasmerio/wasmer
2025
-
[16]
Wasmtime
2025. Wasmtime. https://github.com/bytecodealliance/wasmtime
2025
-
[17]
WebAssembly
2025. WebAssembly. https://webassembly.org/
2025
-
[18]
WebAssembly Control Instructions
2025. WebAssembly Control Instructions. https://webassembly.github.io/spec/ core/syntax/instructions.html#control-instructions
2025
-
[19]
WebAssembly-SPEC
2025. WebAssembly-SPEC. https://webassembly.github.io/spec/core/
2025
-
[20]
2025. x86-64. https://en.wikipedia.org/wiki/X86-64
2025
-
[21]
Marcos K Aguilera, Jeffrey C Mogul, Janet L Wiener, Patrick Reynolds, and Athicha Muthitacharoen. 2003. Performance debugging for distributed systems of black boxes.ACM SIGOPS Operating Systems Review37, 5 (2003), 74–89
2003
-
[22]
Marc Andrysco, David Kohlbrenner, Keaton Mowery, Ranjit Jhala, Sorin Lerner, and Hovav Shacham. 2015. On Subnormal Floating Point and Abnormal Timing. In2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015. IEEE Computer Society, 623–639. doi:10.1109/SP.2015.44
-
[23]
Algirdas Avizienis. 1985. The N-Version Approach to Fault-Tolerant Software. IEEE Trans. Software Eng.11, 12 (1985), 1491–1501. doi:10.1109/TSE.1985.231893
-
[24]
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using Magpie for request extraction and workload modelling.. InOSDI, Vol. 4. 18–18
2004
-
[25]
Shangtong Cao, Ningyu He, Xinyu She, Yixuan Zhang, Mu Zhang, and Haoyu Wang. 2024. WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Prad...
-
[26]
Luca Della Toffola, Michael Pradel, and Thomas R Gross. 2015. Performance problems you can fix: A dynamic analysis of memoization opportunities.ACM SIGPLAN Notices50, 10 (2015), 607–622
2015
-
[27]
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. AFL++: Combining Incremental Steps of Fuzzing Research. In14th USENIX Workshop on Offensive Technologies (WOOT 20). USENIX Association
2020
-
[28]
Frangoudis, and Schahram Dustdar
Philipp Gackstatter, Pantelis A. Frangoudis, and Schahram Dustdar. 2022. Pushing Serverless to the Edge with WebAssembly Runtimes. In22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022. IEEE, 140–149. doi:10.1109/CCGRID54584.2022.00023
-
[29]
Phani Kishore Gadepalli, Sean McBride, Gregor Peach, Ludmila Cherkasova, and Gabriel Parmer. 2020. Sledge: a Serverless-first, Light-weight Wasm Runtime for the Edge. InMiddleware ’20: 21st International Middleware Conference, Delft, The Netherlands, December 7-11, 2020, Dilma Da Silva and Rüdiger Kapitza (Eds.). ACM, 265–279. doi:10.1145/3423211.3425680
-
[30]
Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and J. F. Bastien. 2017. Bringing the web up to speed with WebAssembly. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Mart...
-
[31]
Berger, and Arjun Guha
Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha. 2019. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code. In2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 107–120. https://www.usenix.org/conference/atc19/presentation/ jangda
2019
-
[32]
Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, and Michael R. Lyu. 2023. Revealing Performance Issues in Server-Side WebAssembly Runtimes Via Differential Testing. In38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 661–672. doi:10.1109/ASE56229.2023.00088 ICSE ’2...
-
[33]
Shuyao Jiang, Ruiying Zeng, Yangfan Zhou, and Michael R. Lyu. 2025. Distinguishability-guided Test Program Generation for WebAssembly Runtime Performance Testing. InProceedings of the 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 768–779
2025
-
[34]
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and detecting real-world performance bugs.ACM SIGPLAN Notices47, 6 (2012), 77–88
2012
-
[35]
Concept Drift Detection from Multi-Class Imbalanced Data Streams , year =
J. Menetrey, M. Pasin, P. Felber, and V. Schiavoni. 2021. Twine: An Embedded Trusted Runtime for WebAssembly. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 205–216. doi:10.1109/ICDE51399.2021.00025
-
[36]
Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical Delta Debugging. In28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 20-28, 2006, Leon J. Osterweil, H. Dieter Rombach, and Mary Lou Soffa (Eds.). ACM, 142–151. doi:10.1145/1134285.1134307
-
[37]
Jämes Ménétrey, Marcelo Pasin, Pascal Felber, Valerio Schiavoni, Giovanni Mazzeo, Arne Hollum, and Darshan Vaydia. 2023. A Comprehensive Trusted Runtime for WebAssembly with Intel SGX.IEEE Transactions on Dependable and Secure Computing(2023), 1–18. doi:10.1109/TDSC.2023.3334516
-
[38]
Lenin Ravindranath, Jitendra Padhye, Sharad Agarwal, Ratul Mahajan, Ian Ober- miller, and Shahin Shayandeh. 2012. {AppInsight}: Mobile App Performance Monitoring in the Wild. In10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 107–120
2012
-
[39]
Xiang (Jenny) Ren, Sitao Wang, Zhuqi Jin, David Lion, Adrian Chiu, Tianyin Xu, and Ding Yuan. 2023. Relational Debugging - Pinpointing Root Causes of Performance Problems. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10-12, 2023, Roxana Geambasu and Ed Nightingale (Eds.). USENIX Association, 65–8...
2023
-
[40]
Pietzuch
Simon Shillaker and Peter R. Pietzuch. 2020. Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing. In2020 USENIX Annual Technical Con- ference, USENIX ATC 2020, July 15-17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 419–433. https://www.usenix.org/conference/atc20/ presentation/shillaker
2020
-
[41]
Linhai Song and Shan Lu. 2014. Statistical debugging for real-world performance problems. InProceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, Andrew P. Black and Todd D. Millstein (Eds.). ACM, 561–578. doi:10.1145/2...
-
[42]
Linhai Song and Shan Lu. 2017. Performance diagnosis for inefficient loops. In2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 370–380
2017
-
[43]
Benedikt Spies and Markus Mock. 2021. An evaluation of webassembly in non- web environments. In2021 XLVII Latin American Computing Conference (CLEI). IEEE, 1–10
2021
-
[44]
Pengfei Su, Qingsen Wang, Milind Chabbi, and Xu Liu. 2019. Pinpointing perfor- mance inefficiencies in Java. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 818–829
2019
-
[45]
Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redun- dant loads: A software inefficiency indicator. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 982–993
2019
-
[46]
Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: syntax-guided program reduction. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 361–371. doi:10.1145/3180155.3180236
-
[47]
Theodoros Theodoridis, Manuel Rigger, and Zhendong Su. 2022. Finding missed optimizations through the lens of dead code elimination. InASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022, Babak Falsafi, Michael Ferdman, Shan Lu, and T...
-
[48]
Weihang Wang. 2021. Empowering web applications with webassembly: Are we there yet?. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1301–1305
2021
-
[49]
Weili Wang, Honghan Ji, Peixuan He, Yao Zhang, Ye Wu, and Yinqian Zhang
-
[50]
WAVEN: WebAssembly Memory Virtualization for Enclaves. InNDSS
-
[51]
Lingmei Weng, Peng Huang, Jason Nieh, and Junfeng Yang. 2021. Argus: Debug- ging performance issues in modern desktop applications with annotated causal tracing. In2021 USENIX Annual Technical Conference (USENIX ATC 21). 193–207
2021
-
[52]
Yutian Yan, Tengfei Tu, Lijian Zhao, Yuchen Zhou, and Weihang Wang. 2021. Understanding the performance of webassembly applications. InProceedings of the 21st ACM Internet Measurement Conference. 533–549
2021
-
[53]
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure- Inducing Input.IEEE Trans. Software Eng.28, 2 (2002), 183–200. doi:10.1109/32. 988498
work page doi:10.1109/32 2002
-
[54]
Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Yu Jiang, and Chengnian Sun
-
[55]
PPR: Pairwise Program Reduction. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 338–349. doi:10.1145/3611643.3616275
-
[56]
Yongle Zhang, Kirk Rodrigues, Yu Luo, Michael Stumm, and Ding Yuan. 2019. The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure. InProceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 13...
-
[57]
Wenxuan Zhao, Ruiying Zeng, and Yangfan Zhou. 2024. Wapplique: Testing WebAssembly Runtime via Execution Context-Aware Bytecode Mutation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Pradel (Eds.). ACM, 1035–1047. doi:10.114...
-
[58]
Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm. 2016. {Non- Intrusive} performance profiling for entire software stacks based on the flow reconstruction principle. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 603–618
2016
-
[59]
Yutong Zhao, Lu Xiao, Andre B Bondi, Bihuan Chen, and Yang Liu. 2022. A large- scale empirical study of real-life performance issues in open source projects.IEEE transactions on software engineering49, 2 (2022), 924–946
2022
-
[60]
Shiyao Zhou, Muhui Jiang, Weimin Chen, Hao Zhou, Haoyu Wang, and Xiapu Luo. 2023. Wadiff: A differential testing framework for webassembly runtimes. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 939–950
2023
-
[61]
Shiyao Zhou, Jincheng Wang, He Ye, Hao Zhou, Claire Le Goues, and Xiapu Luo. 2025. LWDIFF: An LLM-Assisted Differential Testing Framework for We- bAssembly Runtimes. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 769–769
2025
-
[62]
Xintong Zhou, Zhenyang Xu, Mengxiao Zhang, Yongqiang Tian, and Chengnian Sun. 2025. WDD: Weighted Delta Debugging . In2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 1592–1603. doi:10.1109/ICSE55347.2025.00071 Received 18 July 2025; accepted 17 October 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.