Recognition: unknown
Synthesizing Backward Error Bounds, Backward
Pith reviewed 2026-05-10 07:59 UTC · model grok-4.3
The pith
A category-theoretic framework automates synthesis of backward error bounds for floating-point numerical programs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. We implement a tool that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program, handling many programs with variable reuse. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of the
What carries the argument
The category Shel, whose morphisms correspond to backward-stable numerical programs and whose categorical structure enables compositional backward error analyses.
If this is right
- Backward error bounds become automatically derivable for numerical programs that reuse variables.
- Any program successfully analyzed receives a sound, machine-checked guarantee that its floating-point result equals the exact result for a nearby input.
- The same categorical structure can be reused to derive different kinds of backward error statements beyond simple bounds.
- The approach scales to programs that previously required manual, case-by-case reasoning.
Where Pith is reading between the lines
- The framework could be combined with existing forward-error tools to produce joint forward-and-backward guarantees for larger codebases.
- Search within Shel might be adapted to verify other stability-related properties such as mixed forward-backward error or condition-number bounds.
- If the syntactic subcategory can be enlarged without losing search efficiency, the method would cover an even wider class of numerical kernels.
Load-bearing premise
The generalized notion of backward stability holds for common floating-point operations and the chosen syntactic subcategory of Shel remains both expressive enough for variable reuse and searchable by the automated algorithm.
What would settle it
A concrete program containing variable reuse for which the search either reports no bound when one exists by direct calculation or reports a bound that is violated by explicit floating-point execution.
Figures
read the original abstract
Backward stability is a desirable property for a well-designed numerical algorithm: given an input, a backward stable floating-point program produces the exact output for a nearby input. While automated tools for bounding the forward error of a numerical program are well-established, few existing tools target backward error analysis. We present a formal framework that enables sound, automated backward error analysis for a broad class of numerical programs. First, we propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Second, based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. Third, we implement a tool, eggshel, that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program. Our algorithm handles many programs with variable reuse, a known challenge in backward error analysis. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of automated analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a formal framework for sound, automated backward error analysis of numerical programs. It proposes a novel generalization of backward stability that is compositional and satisfied by many floating-point operations, defines the category Shel whose morphisms model stable programs and support various analyses, and presents the eggshel tool that searches a syntactic subcategory of Shel to synthesize backward error bounds. The approach explicitly addresses programs with variable reuse, includes a soundness proof for the search algorithm, and evaluates the tool on a suite of programs previously beyond automated analysis.
Significance. If the central claims hold, the work is significant because it provides the first automated, sound tool for backward error analysis on a broad class of programs including those with variable reuse, a known limitation in prior forward-error-focused tools. The categorical structure in Shel offers a flexible, compositional foundation for stability proofs, and the soundness proof plus practical evaluation on intractable programs represent concrete advances. Strengths include the machine-checked soundness claim and the falsifiable application to a concrete suite; this could influence verification tools in numerical computing and PL.
major comments (2)
- [§4.3] §4.3 (syntactic subcategory of Shel): The restrictions chosen to keep eggshel search terminating are load-bearing for the central claim of handling 'many programs with variable reuse.' The manuscript does not provide an explicit characterization or closure properties showing which common reuse patterns (e.g., array overwrites in loops, temporary scalar reuse across statements) are included versus excluded; without this, it is impossible to verify that the claimed suite is covered by the searchable subcategory rather than requiring manual extensions.
- [§5] §5 (soundness theorem): The proof that eggshel returns sound bounds assumes every program in the target suite lies in the syntactic subcategory and that the generalized stability predicate is preserved by the search rules. No separate lemma or table enumerates the reuse patterns actually present in the evaluation suite, making it difficult to confirm that the soundness result transfers to the 'previously beyond reach' programs cited in the abstract.
minor comments (3)
- [Abstract] Abstract and §2: The acronym 'Shel' is never expanded or motivated etymologically; a brief parenthetical would improve readability.
- [Table 1] Table 1 (evaluation results): Several rows report synthesized bounds without the corresponding forward-error comparison or baseline tool timeout data; adding these columns would strengthen the 'previously intractable' claim.
- [§3.1] Notation in §3.1: The composition operator in Shel is overloaded with the same symbol used for ordinary function composition; a distinct notation or explicit disambiguation would prevent confusion.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. The two major comments both concern the precise scope of the syntactic subcategory of Shel used by eggshel and the transfer of the soundness theorem to the evaluation suite. We address each point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§4.3] §4.3 (syntactic subcategory of Shel): The restrictions chosen to keep eggshel search terminating are load-bearing for the central claim of handling 'many programs with variable reuse.' The manuscript does not provide an explicit characterization or closure properties showing which common reuse patterns (e.g., array overwrites in loops, temporary scalar reuse across statements) are included versus excluded; without this, it is impossible to verify that the claimed suite is covered by the searchable subcategory rather than requiring manual extensions.
Authors: We agree that an explicit, self-contained characterization of the syntactic subcategory is needed to make the scope of the termination guarantee fully verifiable. Section 4.3 already defines the subcategory by a finite set of syntactic rules (no intra-loop scalar overwrites except for accumulators, at most one reuse of each temporary scalar between statements, and array overwrites permitted only when the index expression is loop-invariant). However, we did not supply a lemma stating the closure properties under composition or a table of common reuse patterns. In the revision we will add (i) a new lemma that enumerates the admissible reuse patterns with examples of both included and excluded cases, and (ii) a short appendix table that maps each program in the evaluation suite to the specific rules it satisfies. These additions will make it possible to confirm that the suite lies inside the searchable subcategory without manual extension. revision: yes
-
Referee: [§5] §5 (soundness theorem): The proof that eggshel returns sound bounds assumes every program in the target suite lies in the syntactic subcategory and that the generalized stability predicate is preserved by the search rules. No separate lemma or table enumerates the reuse patterns actually present in the evaluation suite, making it difficult to confirm that the soundness result transfers to the 'previously beyond reach' programs cited in the abstract.
Authors: The soundness theorem (Theorem 5.1) is stated for any program belonging to the syntactic subcategory; the proof does not rely on any property outside those rules. To make the transfer to the concrete suite explicit, we will insert a new table (or appendix) that, for each benchmark, lists the reuse patterns it exhibits and the syntactic rules it obeys. Because every benchmark already satisfies the rules used in the definition of the subcategory, the soundness result applies directly. We will also add a one-sentence statement in §5 clarifying that the evaluation programs were selected precisely because they lie inside the subcategory, thereby closing the gap the referee identified. revision: yes
Circularity Check
No circularity: new formal framework with independent soundness proof
full rationale
The paper defines a novel generalization of backward stability, constructs the category Shel based on it, defines a syntactic subcategory for search, and proves soundness of the eggshel algorithm. These steps introduce new structures and a proof rather than reducing any claimed bound or stability result to a fitted parameter, self-definition, or unverified self-citation by construction. The handling of variable reuse is achieved via the subcategory design, which is presented as an independent engineering choice enabling termination and expressiveness, not a tautology. No load-bearing step equates the output analysis to its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Andrew W. Appel and Ariel E. Kellison. 2024. VCFloat2: Floating-Point Error Analysis in Coq. InACM SIGPLAN Conference on Certified Proofs and Programs (CPP), London, England. Association for Computing Machinery, New York, NY, USA, 14–29. doi:10.1145/3636501.3636953
-
[2]
Michael P. Connolly, Nicholas J. Higham, and Theo Mary. 2021. Stochastic Rounding and Its Probabilistic Backward Error Analysis.SIAM Journal on Scientific Computing43, 1 (2021), A566–A585. doi:10.1137/20M1334796
-
[3]
Robert M. Corless and Nicolas Fillion. 2013.A Graduate Introduction to Numerical Methods(first ed.). Springer-Verlag, New York, NY, USA. doi:10.1007/978-1-4614-8453-0
-
[4]
Florent de Dinechin, Christoph Lauter, and Guillaume Melquiond. 2011. Certifying the Floating-Point Implementation of an Elementary Function Using Gappa.IEEE Trans. Comput.60, 2 (2011), 242–253. doi:10.1109/TC.2010.128
-
[5]
doi:10.48456/tr-213 , number =
Valeria Correa Vaz de Paiva. 1989.The Dialectica categories. Ph. D. Dissertation. University of Cambridge, Computer Laboratory. doi:10.48456/tr-213
-
[6]
Max Fan, Ariel E. Kellison, and Samuel D. Pollard. 2025. Mechanizing Olver’s Error Arithmetic. InInternational Workshop on Verification of Scientific Software, Hamilton, Ontario (Electronic Proceedings in Theoretical Computer Science, Vol. 432). Open Publishing Association, Hamilton, Ontario, Canada, 37–47. doi:10.4204/eptcs.432.6
-
[7]
J. Nathan Foster, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.ACM Transactions on Programming Languages and Systems29, 3 (2007), 17. doi:10.1145/1232420.1232424
-
[8]
Zhoulai Fu, Zhaojun Bai, and Zhendong Su. 2015. Automated Backward Error Analysis for Numerical Code. InACM SIGPLAN Conference on Object Oriented Programming: Systems, Languages, and Applications (OOPSLA), Pittsburgh, Pennsylvania. Association for Computing Machinery, New York, NY, USA, 639–654. doi:10.1145/2814270.2814317
-
[9]
Eric Goubault and Sylvie Putot. 2006. Static Analysis of Numerical Algorithms. InInternational Symposium on Static Analysis (SAS), Seoul, Korea. Springer-Verlag, Berlin, Heidelberg, 18–34. doi:10.1007/11823230_3
-
[10]
Desmond J. Higham and Nicholas J. Higham. 1992. Backward Error and Condition of Structured Linear Systems.SIAM J. Matrix Anal. Appl.13, 1 (1992), 162–175. doi:10.1137/0613014
-
[11]
Nicholas J. Higham. 2002.Accuracy and Stability of Numerical Algorithms(second ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. doi:10.1137/1.9780898718027
-
[12]
Martin Hofmann, Benjamin Pierce, and Daniel Wagner. 2012. Edit Lenses. InACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Philadelphia, Pennsylvania. Association for Computing Machinery, New York, NY, USA, 495–508. doi:10.1145/2103656.2103715
-
[13]
Ariel E. Kellison and Justin Hsu. 2024. Numerical Fuzz: A Type System for Rounding Error Analysis.Proceedings of the ACM on Programming Languages8, PLDI, Article 226 (2024), 25 pages. doi:10.1145/3656456
-
[14]
Kellison, Laura Zielinski, David Bindel, and Justin Hsu
Ariel E. Kellison, Laura Zielinski, David Bindel, and Justin Hsu. 2025. Bean: A Language for Backward Error Analysis. Proceedings of the ACM on Programming Languages9, PLDI, Article 221 (2025), 25 pages. doi:10.1145/3729324 Synthesizing Backward Error Bounds, Backward 23
-
[15]
Webb Miller and David Spooner. 1978. Algorithm 532: Software for Roundoff Analysis [Z].ACM Trans. Math. Software 4, 4 (1978), 388–390. doi:10.1145/356502.356497
-
[16]
F. W. J. Olver. 1978. A New Approach to Error Arithmetic.SIAM J. Numer. Anal.15, 2 (1978), 368–393. doi:10.1137/ 0715024
1978
-
[17]
Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. InACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Portland, Oregon. Association for Computing Machinery, New York, NY, USA, 1–11. doi:10.1145/2737924.2737959
-
[18]
1990.Functional Stability Analysis of Numerical Algorithms
Thomas Harvey Rowan. 1990.Functional Stability Analysis of Numerical Algorithms. Ph. D. Dissertation. University of Texas at Austin, USA
1990
-
[19]
Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan
Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan
-
[20]
Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan
Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions.ACM Transactions on Programming Languages and Systems41, 1, Article 2 (2018), 39 pages. doi:10.1145/3230733
-
[21]
Feliu, Paolo Masci, and César A
Laura Titolo, Mariano Moscato, Marco A. Feliu, Paolo Masci, and César A. Muñoz. 2024. Rigorous Floating-Point Round- Off Error Analysis in PRECiSA 4.0. InInternational Symposium on Formal Methods (FM), Milan, Italy. Springer-Verlag, Cham, Switzerland, 20–38. doi:10.1007/978-3-031-71177-0_2
-
[22]
2023.Rounding Errors in Algebraic Processes
James Hardy Wilkinson. 2023.Rounding Errors in Algebraic Processes. Society for Industrial and Applied Mathematics, Philadelphia, PA. doi:10.1137/1.9781611977523 Originally published 1963
-
[23]
Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying Datalog and Equality Saturation.Proceedings of the ACM on Programming Languages7, PLDI, Article 125 (2023), 25 pages. doi:10.1145/3591239
-
[24]
Mattukat, Vincent Schmandt, Langstrof Timo, Zerbe Michael, and Horst Lichter
Laura Zielinski and Justin Hsu. 2026.eggshel: A Floating-Point Backward Error Analysis Tool. doi:10.5281/zenodo. 19391349 24 Laura Zielinski and Justin Hsu A Backward Error Analysis By Hand We will now show an example of performing backward error analysis by hand. This is the type of analysis that our automation emulates. It is a well-known fact that the ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.