Undefined Behavior in C and C++: An Experiment With Desktop Use Cases
Pith reviewed 2026-06-27 09:09 UTC · model grok-4.3
The pith
Undefined behavior is common in typical desktop use of C and C++ programs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By completing 59 simple experimental tasks, nearly 11 thousand unique undefined behavior warnings were generated by 32 unique programs and libraries written in C or C++. Of these warnings, most were associated with the Mesa graphics library and generated by interacting with graphical user interfaces. Merely logging into the GNOME desktop environment generated over 500 unique warnings. Of all warnings, the clear majority was about virtual table pointers. The associated stack traces were also lengthy in general.
What carries the argument
The undefined behavior sanitizer implemented in a compiler, applied to detect warnings during execution of 59 desktop tasks on 32 C and C++ programs.
If this is right
- Graphics libraries such as Mesa account for the bulk of observed warnings during normal GUI interaction.
- Virtual table pointer issues form the largest category of warnings.
- Even basic actions like logging into GNOME produce hundreds of unique warnings.
- Stack traces tied to the warnings tend to be lengthy.
- Empirical collection of sanitizer output on real programs is a workable method for quantifying undefined behavior.
Where Pith is reading between the lines
- Routine inclusion of sanitizers in desktop software testing could surface these issues earlier in development.
- The concentration in graphics code suggests targeted review of libraries handling virtual dispatch might reduce warnings.
- Repeating the experiment on different desktop environments or operating systems could test whether the pattern holds more broadly.
- Lengthy stack traces imply that the triggering call sequences often cross multiple program boundaries.
Load-bearing premise
The sanitizer warnings accurately reflect real undefined behavior instead of false positives, and the 59 tasks represent typical desktop usage.
What would settle it
Re-running the 59 tasks and finding through manual inspection or additional tools that most warnings do not produce actual runtime errors would show the reported extent of undefined behavior to be overstated.
Figures
read the original abstract
Undefined behavior is idiomatic to C and C++ programming; such behavior is a use of an erroneous program construct for which the languages impose no requirements, such as integer overflows. The paper presents an empirical experiment seeking to probe the extent of undefined behavior executing underneath typical desktop use of a Linux distribution. The analysis is based on an undefined behavior sanitizer implemented in a compiler. According to the results, undefined behavior is common. By completing 59 simple experimental tasks, nearly 11 thousand unique undefined behavior warnings were generated by 32 unique programs and libraries written in C or C++. Of these warnings, most were associated with the Mesa graphics library and generated by interacting with graphical user interfaces. Merely logging into the GNOME desktop environment generated over 500 unique warnings. Of all warnings, the clear majority was about virtual table pointers. The associated stack traces were also lengthy in general. With these and other results, the paper contributes to the empirical literature on C and C++.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from an empirical experiment that ran an undefined behavior sanitizer (UBSan) while performing 59 simple desktop tasks on a Linux distribution. It claims that undefined behavior is common, citing nearly 11,000 unique UBSan warnings generated by 32 C/C++ programs and libraries, with the majority associated with virtual table pointers in the Mesa graphics library (including over 500 warnings from merely logging into GNOME).
Significance. If the reported warnings can be shown to correspond to executed undefined behavior rather than sanitizer artifacts, the scale of the measurement would add useful data to the empirical literature on C/C++ safety. The experiment design (desktop tasks on real programs) is a reasonable approach to the question, but the current evidence link is too weak to support the prevalence conclusion.
major comments (2)
- [Abstract] Abstract: the central claim that 'undefined behavior is common' rests on counting ~11k unique UBSan warnings, yet the abstract (and by extension the methods) supplies no information on how uniqueness was determined, how false positives were filtered, or what the 59 tasks consisted of; without this, the data-to-claim mapping cannot be evaluated.
- [Results (vtable warnings)] Results discussion of vtable warnings: the paper notes that the clear majority of warnings concern virtual table pointers and produce lengthy stack traces, but provides no validation (manual review of a sample, forced-trigger execution, or cross-check with another tool) showing these reflect realized UB on executed paths rather than conservative or heuristic reports from the sanitizer.
minor comments (2)
- Provide a table or appendix listing the 32 programs/libraries and the exact 59 tasks so readers can assess representativeness of 'typical desktop use'.
- Clarify whether 'unique' warnings are deduplicated by source location, by stack trace, or by another criterion, and report the raw warning count before deduplication.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive comments. The feedback identifies opportunities to strengthen the clarity of our empirical claims regarding UBSan warnings in real desktop workloads. We respond to each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'undefined behavior is common' rests on counting ~11k unique UBSan warnings, yet the abstract (and by extension the methods) supplies no information on how uniqueness was determined, how false positives were filtered, or what the 59 tasks consisted of; without this, the data-to-claim mapping cannot be evaluated.
Authors: The abstract is kept concise per typical length constraints, but the methods section details the 59 tasks (standard desktop actions including GNOME login, application launches, file operations, and web browsing on a stock Linux distribution) and the collection process. Uniqueness was computed via distinct (UBSan check type, source file, line number) tuples to deduplicate repeated triggers of the same site. No post-hoc false-positive filtering was applied, as the study reports all runtime sanitizer detections without assuming any are artifacts. We will revise the abstract to include one sentence summarizing the task set and uniqueness criterion for improved traceability. revision: yes
-
Referee: [Results (vtable warnings)] Results discussion of vtable warnings: the paper notes that the clear majority of warnings concern virtual table pointers and produce lengthy stack traces, but provides no validation (manual review of a sample, forced-trigger execution, or cross-check with another tool) showing these reflect realized UB on executed paths rather than conservative or heuristic reports from the sanitizer.
Authors: UBSan's vptr check is a dynamic instrumentation that evaluates the vtable pointer validity immediately prior to a virtual call; a warning is emitted only if the check fails at runtime on an executed path. This is not a static or heuristic report but a direct observation of undefined behavior during the 59 tasks. The lengthy traces are characteristic of deep GUI call stacks in Mesa. While we acknowledge that supplementary validation (e.g., targeted reproduction of a sample) could further strengthen the link, the experiment's scale and reliance on a widely accepted runtime tool already provide evidence of prevalence; adding such validation for thousands of warnings would be resource-intensive and is outside the paper's scope. We will insert a short explanatory paragraph on vptr sanitizer semantics in the results section. revision: partial
Circularity Check
No circularity: empirical count of sanitizer warnings with no derivation or self-referential steps
full rationale
The paper reports results from running 59 tasks on 32 programs and counting UBSan warnings. No equations, no fitted parameters renamed as predictions, no self-citations used as load-bearing premises, and no derivation chain that reduces to its own inputs. The work is a direct measurement study; the central claim follows from the experimental procedure rather than from any internal redefinition or imported uniqueness result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Empirical Notes on the Interaction Be- tween Continuous Kernel Fuzzing and Development,
J. Ruohonen and K. Rindell, “Empirical Notes on the Interaction Be- tween Continuous Kernel Fuzzing and Development,” inProceedings of the IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW 2019). Berlin: IEEE, 2019, pp. 276–281
2019
-
[2]
SoK: Sanitizing for Security,
D. Song, J. Lettner, P. Rajasekaran, Y . Na, S. V olckaert, P. Larsen, and M. Fran, “SoK: Sanitizing for Security,” inProceedings of the IEEE Symposium on Security and Privacy (S&P). San Francisco: IEEE, 2019, pp. 1275–1295
2019
-
[3]
Improving Application Security with UndefinedBehavior- Sanitizer (UBSan) and GCC,
E. Zannoni, “Improving Application Security with UndefinedBehavior- Sanitizer (UBSan) and GCC,” 2021, Oracle Linux Blog, available online in 10 May 2026: https://blogs.oracle.com/linux/improving-application- security-with-undefinedbehaviorsanitizer-ubsan-and-gcc
2021
-
[4]
Fun With -fsanitize=undefined and Picolibc,
K. Packard, “Fun With -fsanitize=undefined and Picolibc,” 2025, Avail- able online in 10 April 2026: https://keithp.com/blogs/sanitizer-fun/
2025
-
[5]
Undefined Behavior: What Happened to my Code?
X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F. Kaashoek, “Undefined Behavior: What Happened to my Code?” inProceedings of the Asia-Pacific Workshop on Systems (APSYS 2012). Seoul: ACM, 2021, pp. 1–7
2012
-
[6]
UndefinedBehaviorSanitizer,
The Clang Team, “UndefinedBehaviorSanitizer,” 2026, Available online in 10 April 2026: https://clang.llvm.org/docs/ UndefinedBehaviorSanitizer.html
2026
-
[7]
3.13 Program Instrumentation Op- tions,
Free Software Foundation, Inc., “3.13 Program Instrumentation Op- tions,” 2026, Available online in 10 April 2026: https://gcc.gnu.org/ onlinedocs/gcc/Instrumentation-Options.html
2026
-
[8]
Committee Draft – Septermber 7, 2007 ISO/IEC 9899:TC3,
WG14/N1256, “Committee Draft – Septermber 7, 2007 ISO/IEC 9899:TC3,” 2007, International standardization working group for the programming language C. Available online in 11 April 2026: https: //www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
2007
-
[9]
A Framework for Systematically Addressing Undefined Behaviour in the C++ Standard,
T. Doumler and J. Berne, “A Framework for Systematically Addressing Undefined Behaviour in the C++ Standard,” 2025, JTC1/SC22/WG21 – The C++ Standards Committee – ISOCPP, available online in May 2026: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/ p3100r5.pdf
2025
-
[10]
A Dif- ferential Approach to Undefined Behavior Detection,
X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama, “A Dif- ferential Approach to Undefined Behavior Detection,”Communications of the ACM, vol. 59, no. 3, pp. 99–106, 2016
2016
-
[11]
Defining the Undefinedness of C,
C. Hathhorn, C. Ellison, and G. Ros, “Defining the Undefinedness of C,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). Portland: ACM, 2015, pp. 336–345
2015
-
[12]
Silent Bugs Matter: A Study of Compiler-Introduced Security Bugs,
J. Xu, K. Lu, Z. Du, Z. Ding, L. Li, Q. Wu, M. Payer, and B. Mao, “Silent Bugs Matter: A Study of Compiler-Introduced Security Bugs,” in Proceedings of the 32nd USENIX Security Symposium (USENIX 2023). Anaheim: USENIX, 2023, pp. 3655–3672
2023
-
[13]
Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact,
L. Popescu and N. P. Lopes, “Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact,” Proceedings of the ACM on Programming Languages, vol. 9, no. PLDI, pp. 348–371, 2025
2025
-
[14]
Winters, T
T. Winters, T. Manshreck, and H. Wright,Software Engineering at Google: Lessons Learned From Programming Over Time. Sebastopol: O’Reilly, 2020
2020
-
[15]
A Preliminary Study on Open-Source Memory Vulnerability Detectors,
Y . Nong and H. Cai, “A Preliminary Study on Open-Source Memory Vulnerability Detectors,” inProceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER 2020), London, 2020, pp. 557–561
2020
-
[16]
Dynamic Program Analysis Tools in GCC and CLANG Compilers,
N. I. V ´yukovaa, V . A. Galatenkoa, and S. V . Samborskii, “Dynamic Program Analysis Tools in GCC and CLANG Compilers,”Programming and Computer Software, vol. 46, pp. 81–296, 2020
2020
-
[17]
SanRazoR: Reducing Redundant Sanitizer Checks in C/C++ Programs,
J. Zhang, S. Wang, M. Rigger, P. He, and Z. Su, “SanRazoR: Reducing Redundant Sanitizer Checks in C/C++ Programs,” inProceedings of the 15th USENIX Symposium on Operating Systems Design and Implemen- tation. Online: USENIX, 2021, pp. 479–494
2021
-
[18]
UBFuzz: Finding Bugs in Sanitizer Implementations,
S. Li and Z. Su, “UBFuzz: Finding Bugs in Sanitizer Implementations,” pp. 435–449, 2024
2024
-
[19]
Packages Not Using The Default Build Flags: A Taxonomy,
E. Rocca, “Packages Not Using The Default Build Flags: A Taxonomy,” inProceedings of the 26th Debian Conference (DebConf 2025), Brest, 2025, pp. 1–5, available online in 11 April 2026: https://hal.science/hal- 05334704/document
2025
-
[20]
A Static Analysis of Popular C Packages in Linux,
J. Ruohonen, M. Saddiqa, and K. Sierszecki, “A Static Analysis of Popular C Packages in Linux,” inProceedings of the 22nd Annual International Conference on Privacy, Security, and Trust (PST 2025). Fredericton: IEEE, 2025, pp. 1–10
2025
-
[21]
Wohlin, P
C. Wohlin, P. Runeson, M. H ¨ost, M. C. Ohlsson, B. Regnell, and A. Wessl ´en,Experimentation in Software Engineering, 2nd ed. Hei- delberg: Springer, 2024
2024
-
[22]
Minimal Installation CD,
Gentoo, “Minimal Installation CD,” 2026, available online in 10 April 2026: https://distfiles.gentoo.org/releases/amd64/autobuilds/ 20260408T183104Z/install-amd64-minimal-20260408T183104Z.iso
2026
-
[23]
Default Stage Archives: Stage Desktop Profile & OpenRC,
——, “Default Stage Archives: Stage Desktop Profile & OpenRC,” 2026, available online in 10 April 2026: https://distfiles.gentoo.org/ releases/amd64/autobuilds/20260410T130145Z/stage3-amd64-desktop- openrc-20260410T130145Z.tar.xz
arXiv 2026
-
[24]
Gentoo AMD64 Handbook,
——, “Gentoo AMD64 Handbook,” 2026, available online in 10 April 2026: https://wiki.gentoo.org/wiki/Handbook:AMD64
2026
-
[25]
UndefinedBehaviorSanitizer,
——, “UndefinedBehaviorSanitizer,” 2026, available online in 10 April 2026: https://wiki.gentoo.org/wiki/UndefinedBehaviorSanitizer
2026
-
[26]
C Standards Support in GCC,
Free Software Foundation, Inc., “C Standards Support in GCC,” 2026, Available online in 10 June 2026: https://gcc.gnu.org/projects/c-status. html#c23
2026
-
[27]
C++ Standards Support in GCC,
——, “C++ Standards Support in GCC,” 2026, Available online in 10 June 2026: https://gcc.gnu.org/projects/cxx-status.html?#cxx17
2026
-
[28]
A Time Series Analysis of Assertions in the Linux Kernel,
J. Ruohonen, “A Time Series Analysis of Assertions in the Linux Kernel,” inProceedings of the 37th International Conference on Testing Software and Systems (ICTSS 2025). Limassol: Springer, 2026, pp. 3–15
2025
-
[29]
Memory Error Checking in C and C++: Comparing Sanitizers and Valgrind,
J. Kratochvil, “Memory Error Checking in C and C++: Comparing Sanitizers and Valgrind,” 2021, Red Hat Developer Blog, available online in 10 April 2026: https://developers.redhat.com/blog/2021/05/05/ memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind
2021
-
[30]
Use and Misuse of the Term “Experiment
C. Ayala, B. Turhan, X. Franch, and N. Juristo, “Use and Misuse of the Term “Experiment” in Mining Software Repositories Research,”IEEE Transactions on Software Engineering, vol. 48, no. 11, pp. 4229–4248, 2022
2022
-
[31]
How Can Manual Testing Processes Be Optimized? Developer Survey, Optimiza- tion Guidelines, and Case Studies,
R. Haas, D. Elsner, E. Juergens, A. Pretschner, and S. Apel, “How Can Manual Testing Processes Be Optimized? Developer Survey, Optimiza- tion Guidelines, and Case Studies,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). ACM, 2021, pp. 1281–1291
2021
-
[32]
Prioritizing Man- ual Test Cases in Rapid Release Environments,
H. Hemmati, Z. Fang, M. V . M ¨antyl¨a, and B. Adams, “Prioritizing Man- ual Test Cases in Rapid Release Environments,”Journal of Software: Testing, Verification and Reliability, vol. 27, no. 6, p. e1609, 2017
2017
-
[33]
Defect Detection Effi- ciency: Test Case Based vs. Exploratory Testing,
J. Itkonen, M. V . M ¨antyla, and C. Lassenius, “Defect Detection Effi- ciency: Test Case Based vs. Exploratory Testing,” inProceedings of he First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007). Madrid: IEEE, 2007, pp. 61–70
2007
-
[34]
An Investigation of the Relationships Between Lines of Code and Defects,
H. Zhang, “An Investigation of the Relationships Between Lines of Code and Defects,” inProceedings of the IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, 2009, pp. 274–283
2009
-
[35]
A Systematic Review of Quasi-Experiments in Software Engineering,
V . B. Kampenes, T. Dyb ˚a, J. E. Hannay, and D. I. K. Sjøberg, “A Systematic Review of Quasi-Experiments in Software Engineering,” Information and Software Technology, vol. 51, pp. 71–82, 2009
2009
-
[36]
Automatically Locating ARM Instructions Deviation Between Real Devices and CPU Emulators,
M. Jiang, T. Xu, Y . Zhou, Y . Hu, M. Zhong, L. Wu, X. Luo, and K. Ren, “Automatically Locating ARM Instructions Deviation Between Real Devices and CPU Emulators,” 2021, archived manuscript, available online in 12 April 2026: https://arxiv.org/abs/2105.14273
arXiv 2021
-
[37]
Proto: A Guided Journey Through Modern OS Construction,
W. Choe, R. Wang, A. Benazir, and F. X. Lin, “Proto: A Guided Journey Through Modern OS Construction,” inProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP 2025). Seoul: ACM, 2025, pp. 50–66
2025
-
[38]
Undefined Behavior in 2017,
P. Cuoq and J. Regehr, “Undefined Behavior in 2017,” 2017, available online in June 2026: https://blog.regehr.org/archives/1520
2017
-
[39]
A Guide to Undefined Behavior in C and C++, Part 1,
J. Regehr, “A Guide to Undefined Behavior in C and C++, Part 1,” 2010, available online on 12 April 2026: https://blog.regehr.org/archives/213
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.