pith. sign in

arxiv: 2606.12064 · v1 · pith:L6OZBGEKnew · submitted 2026-06-10 · 💻 cs.SE · cs.CR

Undefined Behavior in C and C++: An Experiment With Desktop Use Cases

Pith reviewed 2026-06-27 09:09 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords undefined behaviorC programmingC++ programmingundefined behavior sanitizerdesktop environmentLinuxMesa libraryempirical experiment
0
0 comments X

The pith

Undefined behavior is common in typical desktop use of C and C++ programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs an undefined behavior sanitizer on a Linux desktop to measure how often such behavior appears during normal use. It completes 59 simple tasks across 32 programs and libraries, producing nearly 11,000 unique warnings. Most warnings trace to the Mesa graphics library during GUI work, with over 500 appearing from a single GNOME login. The majority concern virtual table pointers and come with long stack traces. A reader would care because this indicates that constructs the languages leave undefined are not edge cases but part of everyday desktop operation.

Core claim

By completing 59 simple experimental tasks, nearly 11 thousand unique undefined behavior warnings were generated by 32 unique programs and libraries written in C or C++. Of these warnings, most were associated with the Mesa graphics library and generated by interacting with graphical user interfaces. Merely logging into the GNOME desktop environment generated over 500 unique warnings. Of all warnings, the clear majority was about virtual table pointers. The associated stack traces were also lengthy in general.

What carries the argument

The undefined behavior sanitizer implemented in a compiler, applied to detect warnings during execution of 59 desktop tasks on 32 C and C++ programs.

If this is right

  • Graphics libraries such as Mesa account for the bulk of observed warnings during normal GUI interaction.
  • Virtual table pointer issues form the largest category of warnings.
  • Even basic actions like logging into GNOME produce hundreds of unique warnings.
  • Stack traces tied to the warnings tend to be lengthy.
  • Empirical collection of sanitizer output on real programs is a workable method for quantifying undefined behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Routine inclusion of sanitizers in desktop software testing could surface these issues earlier in development.
  • The concentration in graphics code suggests targeted review of libraries handling virtual dispatch might reduce warnings.
  • Repeating the experiment on different desktop environments or operating systems could test whether the pattern holds more broadly.
  • Lengthy stack traces imply that the triggering call sequences often cross multiple program boundaries.

Load-bearing premise

The sanitizer warnings accurately reflect real undefined behavior instead of false positives, and the 59 tasks represent typical desktop usage.

What would settle it

Re-running the 59 tasks and finding through manual inspection or additional tools that most warnings do not produce actual runtime errors would show the reported extent of undefined behavior to be overstated.

Figures

Figures reproduced from arXiv: 2606.12064 by Jukka Ruohonen, Krzysztof Sierszecki.

Figure 1
Figure 1. Figure 1: Undefined Warnings Generated by PaLs Type mismatches (0.03%) Variable length arrays (0.05%) Integer overflows (0.05%) Buffer overflows (0.14%) Misaligned accesses (0.82%) NULL pointers (2.08%) Shift operators (8.57%) Virtual pointers (88.26%) 0 20 40 60 80 100 Share of all unique warnings (%) [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Undefined Warning Categories [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Undefined Warnings Generated by Tasks (only tasks having generated at least one warning are shown) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trace Lengths Across Tasks Having Generated One or More Warnings [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

Undefined behavior is idiomatic to C and C++ programming; such behavior is a use of an erroneous program construct for which the languages impose no requirements, such as integer overflows. The paper presents an empirical experiment seeking to probe the extent of undefined behavior executing underneath typical desktop use of a Linux distribution. The analysis is based on an undefined behavior sanitizer implemented in a compiler. According to the results, undefined behavior is common. By completing 59 simple experimental tasks, nearly 11 thousand unique undefined behavior warnings were generated by 32 unique programs and libraries written in C or C++. Of these warnings, most were associated with the Mesa graphics library and generated by interacting with graphical user interfaces. Merely logging into the GNOME desktop environment generated over 500 unique warnings. Of all warnings, the clear majority was about virtual table pointers. The associated stack traces were also lengthy in general. With these and other results, the paper contributes to the empirical literature on C and C++.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from an empirical experiment that ran an undefined behavior sanitizer (UBSan) while performing 59 simple desktop tasks on a Linux distribution. It claims that undefined behavior is common, citing nearly 11,000 unique UBSan warnings generated by 32 C/C++ programs and libraries, with the majority associated with virtual table pointers in the Mesa graphics library (including over 500 warnings from merely logging into GNOME).

Significance. If the reported warnings can be shown to correspond to executed undefined behavior rather than sanitizer artifacts, the scale of the measurement would add useful data to the empirical literature on C/C++ safety. The experiment design (desktop tasks on real programs) is a reasonable approach to the question, but the current evidence link is too weak to support the prevalence conclusion.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'undefined behavior is common' rests on counting ~11k unique UBSan warnings, yet the abstract (and by extension the methods) supplies no information on how uniqueness was determined, how false positives were filtered, or what the 59 tasks consisted of; without this, the data-to-claim mapping cannot be evaluated.
  2. [Results (vtable warnings)] Results discussion of vtable warnings: the paper notes that the clear majority of warnings concern virtual table pointers and produce lengthy stack traces, but provides no validation (manual review of a sample, forced-trigger execution, or cross-check with another tool) showing these reflect realized UB on executed paths rather than conservative or heuristic reports from the sanitizer.
minor comments (2)
  1. Provide a table or appendix listing the 32 programs/libraries and the exact 59 tasks so readers can assess representativeness of 'typical desktop use'.
  2. Clarify whether 'unique' warnings are deduplicated by source location, by stack trace, or by another criterion, and report the raw warning count before deduplication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. The feedback identifies opportunities to strengthen the clarity of our empirical claims regarding UBSan warnings in real desktop workloads. We respond to each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'undefined behavior is common' rests on counting ~11k unique UBSan warnings, yet the abstract (and by extension the methods) supplies no information on how uniqueness was determined, how false positives were filtered, or what the 59 tasks consisted of; without this, the data-to-claim mapping cannot be evaluated.

    Authors: The abstract is kept concise per typical length constraints, but the methods section details the 59 tasks (standard desktop actions including GNOME login, application launches, file operations, and web browsing on a stock Linux distribution) and the collection process. Uniqueness was computed via distinct (UBSan check type, source file, line number) tuples to deduplicate repeated triggers of the same site. No post-hoc false-positive filtering was applied, as the study reports all runtime sanitizer detections without assuming any are artifacts. We will revise the abstract to include one sentence summarizing the task set and uniqueness criterion for improved traceability. revision: yes

  2. Referee: [Results (vtable warnings)] Results discussion of vtable warnings: the paper notes that the clear majority of warnings concern virtual table pointers and produce lengthy stack traces, but provides no validation (manual review of a sample, forced-trigger execution, or cross-check with another tool) showing these reflect realized UB on executed paths rather than conservative or heuristic reports from the sanitizer.

    Authors: UBSan's vptr check is a dynamic instrumentation that evaluates the vtable pointer validity immediately prior to a virtual call; a warning is emitted only if the check fails at runtime on an executed path. This is not a static or heuristic report but a direct observation of undefined behavior during the 59 tasks. The lengthy traces are characteristic of deep GUI call stacks in Mesa. While we acknowledge that supplementary validation (e.g., targeted reproduction of a sample) could further strengthen the link, the experiment's scale and reliance on a widely accepted runtime tool already provide evidence of prevalence; adding such validation for thousands of warnings would be resource-intensive and is outside the paper's scope. We will insert a short explanatory paragraph on vptr sanitizer semantics in the results section. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical count of sanitizer warnings with no derivation or self-referential steps

full rationale

The paper reports results from running 59 tasks on 32 programs and counting UBSan warnings. No equations, no fitted parameters renamed as predictions, no self-citations used as load-bearing premises, and no derivation chain that reduces to its own inputs. The work is a direct measurement study; the central claim follows from the experimental procedure rather than from any internal redefinition or imported uniqueness result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical measurement study that relies on an existing compiler sanitizer; no free parameters, mathematical axioms, or new postulated entities are introduced or required.

pith-pipeline@v0.9.1-grok · 5697 in / 1136 out tokens · 22036 ms · 2026-06-27T09:09:33.961169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references

  1. [1]

    Empirical Notes on the Interaction Be- tween Continuous Kernel Fuzzing and Development,

    J. Ruohonen and K. Rindell, “Empirical Notes on the Interaction Be- tween Continuous Kernel Fuzzing and Development,” inProceedings of the IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW 2019). Berlin: IEEE, 2019, pp. 276–281

  2. [2]

    SoK: Sanitizing for Security,

    D. Song, J. Lettner, P. Rajasekaran, Y . Na, S. V olckaert, P. Larsen, and M. Fran, “SoK: Sanitizing for Security,” inProceedings of the IEEE Symposium on Security and Privacy (S&P). San Francisco: IEEE, 2019, pp. 1275–1295

  3. [3]

    Improving Application Security with UndefinedBehavior- Sanitizer (UBSan) and GCC,

    E. Zannoni, “Improving Application Security with UndefinedBehavior- Sanitizer (UBSan) and GCC,” 2021, Oracle Linux Blog, available online in 10 May 2026: https://blogs.oracle.com/linux/improving-application- security-with-undefinedbehaviorsanitizer-ubsan-and-gcc

  4. [4]

    Fun With -fsanitize=undefined and Picolibc,

    K. Packard, “Fun With -fsanitize=undefined and Picolibc,” 2025, Avail- able online in 10 April 2026: https://keithp.com/blogs/sanitizer-fun/

  5. [5]

    Undefined Behavior: What Happened to my Code?

    X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F. Kaashoek, “Undefined Behavior: What Happened to my Code?” inProceedings of the Asia-Pacific Workshop on Systems (APSYS 2012). Seoul: ACM, 2021, pp. 1–7

  6. [6]

    UndefinedBehaviorSanitizer,

    The Clang Team, “UndefinedBehaviorSanitizer,” 2026, Available online in 10 April 2026: https://clang.llvm.org/docs/ UndefinedBehaviorSanitizer.html

  7. [7]

    3.13 Program Instrumentation Op- tions,

    Free Software Foundation, Inc., “3.13 Program Instrumentation Op- tions,” 2026, Available online in 10 April 2026: https://gcc.gnu.org/ onlinedocs/gcc/Instrumentation-Options.html

  8. [8]

    Committee Draft – Septermber 7, 2007 ISO/IEC 9899:TC3,

    WG14/N1256, “Committee Draft – Septermber 7, 2007 ISO/IEC 9899:TC3,” 2007, International standardization working group for the programming language C. Available online in 11 April 2026: https: //www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

  9. [9]

    A Framework for Systematically Addressing Undefined Behaviour in the C++ Standard,

    T. Doumler and J. Berne, “A Framework for Systematically Addressing Undefined Behaviour in the C++ Standard,” 2025, JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, available online in May 2026: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/ p3100r5.pdf

  10. [10]

    A Dif- ferential Approach to Undefined Behavior Detection,

    X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama, “A Dif- ferential Approach to Undefined Behavior Detection,”Communications of the ACM, vol. 59, no. 3, pp. 99–106, 2016

  11. [11]

    Defining the Undefinedness of C,

    C. Hathhorn, C. Ellison, and G. Ros, “Defining the Undefinedness of C,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). Portland: ACM, 2015, pp. 336–345

  12. [12]

    Silent Bugs Matter: A Study of Compiler-Introduced Security Bugs,

    J. Xu, K. Lu, Z. Du, Z. Ding, L. Li, Q. Wu, M. Payer, and B. Mao, “Silent Bugs Matter: A Study of Compiler-Introduced Security Bugs,” in Proceedings of the 32nd USENIX Security Symposium (USENIX 2023). Anaheim: USENIX, 2023, pp. 3655–3672

  13. [13]

    Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact,

    L. Popescu and N. P. Lopes, “Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact,” Proceedings of the ACM on Programming Languages, vol. 9, no. PLDI, pp. 348–371, 2025

  14. [14]

    Winters, T

    T. Winters, T. Manshreck, and H. Wright,Software Engineering at Google: Lessons Learned From Programming Over Time. Sebastopol: O’Reilly, 2020

  15. [15]

    A Preliminary Study on Open-Source Memory Vulnerability Detectors,

    Y . Nong and H. Cai, “A Preliminary Study on Open-Source Memory Vulnerability Detectors,” inProceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER 2020), London, 2020, pp. 557–561

  16. [16]

    Dynamic Program Analysis Tools in GCC and CLANG Compilers,

    N. I. V ´yukovaa, V . A. Galatenkoa, and S. V . Samborskii, “Dynamic Program Analysis Tools in GCC and CLANG Compilers,”Programming and Computer Software, vol. 46, pp. 81–296, 2020

  17. [17]

    SanRazoR: Reducing Redundant Sanitizer Checks in C/C++ Programs,

    J. Zhang, S. Wang, M. Rigger, P. He, and Z. Su, “SanRazoR: Reducing Redundant Sanitizer Checks in C/C++ Programs,” inProceedings of the 15th USENIX Symposium on Operating Systems Design and Implemen- tation. Online: USENIX, 2021, pp. 479–494

  18. [18]

    UBFuzz: Finding Bugs in Sanitizer Implementations,

    S. Li and Z. Su, “UBFuzz: Finding Bugs in Sanitizer Implementations,” pp. 435–449, 2024

  19. [19]

    Packages Not Using The Default Build Flags: A Taxonomy,

    E. Rocca, “Packages Not Using The Default Build Flags: A Taxonomy,” inProceedings of the 26th Debian Conference (DebConf 2025), Brest, 2025, pp. 1–5, available online in 11 April 2026: https://hal.science/hal- 05334704/document

  20. [20]

    A Static Analysis of Popular C Packages in Linux,

    J. Ruohonen, M. Saddiqa, and K. Sierszecki, “A Static Analysis of Popular C Packages in Linux,” inProceedings of the 22nd Annual International Conference on Privacy, Security, and Trust (PST 2025). Fredericton: IEEE, 2025, pp. 1–10

  21. [21]

    Wohlin, P

    C. Wohlin, P. Runeson, M. H ¨ost, M. C. Ohlsson, B. Regnell, and A. Wessl ´en,Experimentation in Software Engineering, 2nd ed. Hei- delberg: Springer, 2024

  22. [22]

    Minimal Installation CD,

    Gentoo, “Minimal Installation CD,” 2026, available online in 10 April 2026: https://distfiles.gentoo.org/releases/amd64/autobuilds/ 20260408T183104Z/install-amd64-minimal-20260408T183104Z.iso

  23. [23]

    Default Stage Archives: Stage Desktop Profile & OpenRC,

    ——, “Default Stage Archives: Stage Desktop Profile & OpenRC,” 2026, available online in 10 April 2026: https://distfiles.gentoo.org/ releases/amd64/autobuilds/20260410T130145Z/stage3-amd64-desktop- openrc-20260410T130145Z.tar.xz

  24. [24]

    Gentoo AMD64 Handbook,

    ——, “Gentoo AMD64 Handbook,” 2026, available online in 10 April 2026: https://wiki.gentoo.org/wiki/Handbook:AMD64

  25. [25]

    UndefinedBehaviorSanitizer,

    ——, “UndefinedBehaviorSanitizer,” 2026, available online in 10 April 2026: https://wiki.gentoo.org/wiki/UndefinedBehaviorSanitizer

  26. [26]

    C Standards Support in GCC,

    Free Software Foundation, Inc., “C Standards Support in GCC,” 2026, Available online in 10 June 2026: https://gcc.gnu.org/projects/c-status. html#c23

  27. [27]

    C++ Standards Support in GCC,

    ——, “C++ Standards Support in GCC,” 2026, Available online in 10 June 2026: https://gcc.gnu.org/projects/cxx-status.html?#cxx17

  28. [28]

    A Time Series Analysis of Assertions in the Linux Kernel,

    J. Ruohonen, “A Time Series Analysis of Assertions in the Linux Kernel,” inProceedings of the 37th International Conference on Testing Software and Systems (ICTSS 2025). Limassol: Springer, 2026, pp. 3–15

  29. [29]

    Memory Error Checking in C and C++: Comparing Sanitizers and Valgrind,

    J. Kratochvil, “Memory Error Checking in C and C++: Comparing Sanitizers and Valgrind,” 2021, Red Hat Developer Blog, available online in 10 April 2026: https://developers.redhat.com/blog/2021/05/05/ memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind

  30. [30]

    Use and Misuse of the Term “Experiment

    C. Ayala, B. Turhan, X. Franch, and N. Juristo, “Use and Misuse of the Term “Experiment” in Mining Software Repositories Research,”IEEE Transactions on Software Engineering, vol. 48, no. 11, pp. 4229–4248, 2022

  31. [31]

    How Can Manual Testing Processes Be Optimized? Developer Survey, Optimiza- tion Guidelines, and Case Studies,

    R. Haas, D. Elsner, E. Juergens, A. Pretschner, and S. Apel, “How Can Manual Testing Processes Be Optimized? Developer Survey, Optimiza- tion Guidelines, and Case Studies,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). ACM, 2021, pp. 1281–1291

  32. [32]

    Prioritizing Man- ual Test Cases in Rapid Release Environments,

    H. Hemmati, Z. Fang, M. V . M ¨antyl¨a, and B. Adams, “Prioritizing Man- ual Test Cases in Rapid Release Environments,”Journal of Software: Testing, Verification and Reliability, vol. 27, no. 6, p. e1609, 2017

  33. [33]

    Defect Detection Effi- ciency: Test Case Based vs. Exploratory Testing,

    J. Itkonen, M. V . M ¨antyla, and C. Lassenius, “Defect Detection Effi- ciency: Test Case Based vs. Exploratory Testing,” inProceedings of he First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007). Madrid: IEEE, 2007, pp. 61–70

  34. [34]

    An Investigation of the Relationships Between Lines of Code and Defects,

    H. Zhang, “An Investigation of the Relationships Between Lines of Code and Defects,” inProceedings of the IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, 2009, pp. 274–283

  35. [35]

    A Systematic Review of Quasi-Experiments in Software Engineering,

    V . B. Kampenes, T. Dyb ˚a, J. E. Hannay, and D. I. K. Sjøberg, “A Systematic Review of Quasi-Experiments in Software Engineering,” Information and Software Technology, vol. 51, pp. 71–82, 2009

  36. [36]

    Automatically Locating ARM Instructions Deviation Between Real Devices and CPU Emulators,

    M. Jiang, T. Xu, Y . Zhou, Y . Hu, M. Zhong, L. Wu, X. Luo, and K. Ren, “Automatically Locating ARM Instructions Deviation Between Real Devices and CPU Emulators,” 2021, archived manuscript, available online in 12 April 2026: https://arxiv.org/abs/2105.14273

  37. [37]

    Proto: A Guided Journey Through Modern OS Construction,

    W. Choe, R. Wang, A. Benazir, and F. X. Lin, “Proto: A Guided Journey Through Modern OS Construction,” inProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP 2025). Seoul: ACM, 2025, pp. 50–66

  38. [38]

    Undefined Behavior in 2017,

    P. Cuoq and J. Regehr, “Undefined Behavior in 2017,” 2017, available online in June 2026: https://blog.regehr.org/archives/1520

  39. [39]

    A Guide to Undefined Behavior in C and C++, Part 1,

    J. Regehr, “A Guide to Undefined Behavior in C and C++, Part 1,” 2010, available online on 12 April 2026: https://blog.regehr.org/archives/213