arxiv: 2604.15633 · v1 · submitted 2026-04-17 · 💻 cs.PL

Recognition: unknown

Synthesizing Backward Error Bounds, Backward

Justin Hsu, Laura Zielinski

Pith reviewed 2026-05-10 07:59 UTC · model grok-4.3

classification 💻 cs.PL

keywords backward error analysisfloating-point programsbackward stabilitycategory theoryprogram verificationnumerical stabilityautomated synthesisformal methods

0 comments

The pith

A category-theoretic framework automates synthesis of backward error bounds for floating-point numerical programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a generalized definition of backward stability that is compositional and applies flexibly to many floating-point operations. It constructs the category Shel whose morphisms represent stable numerical programs and whose structure supports multiple forms of backward error analysis. An algorithm searches a syntactic subcategory of Shel to find proofs of stability for concrete programs, including those that reuse variables. Soundness of the search is established, and the resulting tool produces explicit backward error bounds for programs that prior automated methods could not reach. If correct, this removes a long-standing barrier between formal verification and practical numerical computing.

Core claim

We propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. We implement a tool that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program, handling many programs with variable reuse. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of the

What carries the argument

The category Shel, whose morphisms correspond to backward-stable numerical programs and whose categorical structure enables compositional backward error analyses.

If this is right

Backward error bounds become automatically derivable for numerical programs that reuse variables.
Any program successfully analyzed receives a sound, machine-checked guarantee that its floating-point result equals the exact result for a nearby input.
The same categorical structure can be reused to derive different kinds of backward error statements beyond simple bounds.
The approach scales to programs that previously required manual, case-by-case reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be combined with existing forward-error tools to produce joint forward-and-backward guarantees for larger codebases.
Search within Shel might be adapted to verify other stability-related properties such as mixed forward-backward error or condition-number bounds.
If the syntactic subcategory can be enlarged without losing search efficiency, the method would cover an even wider class of numerical kernels.

Load-bearing premise

The generalized notion of backward stability holds for common floating-point operations and the chosen syntactic subcategory of Shel remains both expressive enough for variable reuse and searchable by the automated algorithm.

What would settle it

A concrete program containing variable reuse for which the search either reports no bound when one exists by direct calculation or reports a bound that is violated by explicit floating-point execution.

Figures

Figures reproduced from arXiv: 2604.15633 by Justin Hsu, Laura Zielinski.

**Figure 2.** Figure 2: A subset of the rules defining R (Γ1, 𝐸, Γ2). Proof. By induction on the derivation of the relation R, and case analysis on the last rule applied. We display a few representative cases. Case (Cong). This rule says that we can apply the relation to only some part of a context. Categorically, it corresponds to the congruence property of the tensor product. By our inductive hypothesis on the premise, we have … view at source ↗

read the original abstract

Backward stability is a desirable property for a well-designed numerical algorithm: given an input, a backward stable floating-point program produces the exact output for a nearby input. While automated tools for bounding the forward error of a numerical program are well-established, few existing tools target backward error analysis. We present a formal framework that enables sound, automated backward error analysis for a broad class of numerical programs. First, we propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Second, based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. Third, we implement a tool, eggshel, that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program. Our algorithm handles many programs with variable reuse, a known challenge in backward error analysis. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of automated analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a category Shel and a search tool eggshel to automate backward error bounds for floating-point programs, including cases with variable reuse.

read the letter

This paper gives a categorical framework for automated backward error analysis of numerical programs. The core is a new category Shel whose morphisms represent stable programs, along with a search algorithm in a syntactic subcategory that finds backward error bounds. They also generalize the definition of backward stability to make it compositional and flexible across many floating-point operations. They prove the algorithm sound and apply the tool to a suite of programs that prior automated methods could not reach. The work fills a real gap, since forward error tools are common but backward error analysis has stayed mostly manual. The handling of variable reuse is a concrete step forward on a known blocker. The main soft spot is the syntactic subcategory itself. It must be broad enough to capture real reuse patterns yet narrow enough for the search to terminate with sound bounds. The abstract states that the algorithm handles many such programs, but the exact syntax restrictions and how representative the evaluation suite is will determine how far the automation claim extends. If the subcategory turns out to exclude common reuse cases without manual extension, the breadth of the result shrinks. This is aimed at programming languages researchers who work on numerical verification and floating-point analysis. Readers building tools for program correctness in scientific computing will find the framework and implementation useful. The formal model, soundness proof, and application to previously intractable cases are enough to justify sending it to a serious referee.

Referee Report

2 major / 3 minor

Summary. The paper introduces a formal framework for sound, automated backward error analysis of numerical programs. It proposes a novel generalization of backward stability that is compositional and satisfied by many floating-point operations, defines the category Shel whose morphisms model stable programs and support various analyses, and presents the eggshel tool that searches a syntactic subcategory of Shel to synthesize backward error bounds. The approach explicitly addresses programs with variable reuse, includes a soundness proof for the search algorithm, and evaluates the tool on a suite of programs previously beyond automated analysis.

Significance. If the central claims hold, the work is significant because it provides the first automated, sound tool for backward error analysis on a broad class of programs including those with variable reuse, a known limitation in prior forward-error-focused tools. The categorical structure in Shel offers a flexible, compositional foundation for stability proofs, and the soundness proof plus practical evaluation on intractable programs represent concrete advances. Strengths include the machine-checked soundness claim and the falsifiable application to a concrete suite; this could influence verification tools in numerical computing and PL.

major comments (2)

[§4.3] §4.3 (syntactic subcategory of Shel): The restrictions chosen to keep eggshel search terminating are load-bearing for the central claim of handling 'many programs with variable reuse.' The manuscript does not provide an explicit characterization or closure properties showing which common reuse patterns (e.g., array overwrites in loops, temporary scalar reuse across statements) are included versus excluded; without this, it is impossible to verify that the claimed suite is covered by the searchable subcategory rather than requiring manual extensions.
[§5] §5 (soundness theorem): The proof that eggshel returns sound bounds assumes every program in the target suite lies in the syntactic subcategory and that the generalized stability predicate is preserved by the search rules. No separate lemma or table enumerates the reuse patterns actually present in the evaluation suite, making it difficult to confirm that the soundness result transfers to the 'previously beyond reach' programs cited in the abstract.

minor comments (3)

[Abstract] Abstract and §2: The acronym 'Shel' is never expanded or motivated etymologically; a brief parenthetical would improve readability.
[Table 1] Table 1 (evaluation results): Several rows report synthesized bounds without the corresponding forward-error comparison or baseline tool timeout data; adding these columns would strengthen the 'previously intractable' claim.
[§3.1] Notation in §3.1: The composition operator in Shel is overloaded with the same symbol used for ordinary function composition; a distinct notation or explicit disambiguation would prevent confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The two major comments both concern the precise scope of the syntactic subcategory of Shel used by eggshel and the transfer of the soundness theorem to the evaluation suite. We address each point below and indicate the revisions we will make.

read point-by-point responses

Referee: [§4.3] §4.3 (syntactic subcategory of Shel): The restrictions chosen to keep eggshel search terminating are load-bearing for the central claim of handling 'many programs with variable reuse.' The manuscript does not provide an explicit characterization or closure properties showing which common reuse patterns (e.g., array overwrites in loops, temporary scalar reuse across statements) are included versus excluded; without this, it is impossible to verify that the claimed suite is covered by the searchable subcategory rather than requiring manual extensions.

Authors: We agree that an explicit, self-contained characterization of the syntactic subcategory is needed to make the scope of the termination guarantee fully verifiable. Section 4.3 already defines the subcategory by a finite set of syntactic rules (no intra-loop scalar overwrites except for accumulators, at most one reuse of each temporary scalar between statements, and array overwrites permitted only when the index expression is loop-invariant). However, we did not supply a lemma stating the closure properties under composition or a table of common reuse patterns. In the revision we will add (i) a new lemma that enumerates the admissible reuse patterns with examples of both included and excluded cases, and (ii) a short appendix table that maps each program in the evaluation suite to the specific rules it satisfies. These additions will make it possible to confirm that the suite lies inside the searchable subcategory without manual extension. revision: yes
Referee: [§5] §5 (soundness theorem): The proof that eggshel returns sound bounds assumes every program in the target suite lies in the syntactic subcategory and that the generalized stability predicate is preserved by the search rules. No separate lemma or table enumerates the reuse patterns actually present in the evaluation suite, making it difficult to confirm that the soundness result transfers to the 'previously beyond reach' programs cited in the abstract.

Authors: The soundness theorem (Theorem 5.1) is stated for any program belonging to the syntactic subcategory; the proof does not rely on any property outside those rules. To make the transfer to the concrete suite explicit, we will insert a new table (or appendix) that, for each benchmark, lists the reuse patterns it exhibits and the syntactic rules it obeys. Because every benchmark already satisfies the rules used in the definition of the subcategory, the soundness result applies directly. We will also add a one-sentence statement in §5 clarifying that the evaluation programs were selected precisely because they lie inside the subcategory, thereby closing the gap the referee identified. revision: yes

Circularity Check

0 steps flagged

No circularity: new formal framework with independent soundness proof

full rationale

The paper defines a novel generalization of backward stability, constructs the category Shel based on it, defines a syntactic subcategory for search, and proves soundness of the eggshel algorithm. These steps introduce new structures and a proof rather than reducing any claimed bound or stability result to a fitted parameter, self-definition, or unverified self-citation by construction. The handling of variable reuse is achieved via the subcategory design, which is presented as an independent engineering choice enabling termination and expressiveness, not a tautology. No load-bearing step equates the output analysis to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework appears to rest on standard category theory and floating-point semantics assumptions not detailed here.

pith-pipeline@v0.9.0 · 5478 in / 1025 out tokens · 19641 ms · 2026-05-10T07:59:20.426196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 21 canonical work pages

[1]

Appel and Ariel E

Andrew W. Appel and Ariel E. Kellison. 2024. VCFloat2: Floating-Point Error Analysis in Coq. InACM SIGPLAN Conference on Certified Proofs and Programs (CPP), London, England. Association for Computing Machinery, New York, NY, USA, 14–29. doi:10.1145/3636501.3636953

work page doi:10.1145/3636501.3636953 2024
[2]

Connolly, Nicholas J

Michael P. Connolly, Nicholas J. Higham, and Theo Mary. 2021. Stochastic Rounding and Its Probabilistic Backward Error Analysis.SIAM Journal on Scientific Computing43, 1 (2021), A566–A585. doi:10.1137/20M1334796

work page doi:10.1137/20m1334796 2021
[3]

Corless and Nicolas Fillion

Robert M. Corless and Nicolas Fillion. 2013.A Graduate Introduction to Numerical Methods(first ed.). Springer-Verlag, New York, NY, USA. doi:10.1007/978-1-4614-8453-0

work page doi:10.1007/978-1-4614-8453-0 2013
[4]

Florent de Dinechin, Christoph Lauter, and Guillaume Melquiond. 2011. Certifying the Floating-Point Implementation of an Elementary Function Using Gappa.IEEE Trans. Comput.60, 2 (2011), 242–253. doi:10.1109/TC.2010.128

work page doi:10.1109/tc.2010.128 2011
[5]

doi:10.48456/tr-213 , number =

Valeria Correa Vaz de Paiva. 1989.The Dialectica categories. Ph. D. Dissertation. University of Cambridge, Computer Laboratory. doi:10.48456/tr-213

work page doi:10.48456/tr-213 1989
[6]

Kellison, and Samuel D

Max Fan, Ariel E. Kellison, and Samuel D. Pollard. 2025. Mechanizing Olver’s Error Arithmetic. InInternational Workshop on Verification of Scientific Software, Hamilton, Ontario (Electronic Proceedings in Theoretical Computer Science, Vol. 432). Open Publishing Association, Hamilton, Ontario, Canada, 37–47. doi:10.4204/eptcs.432.6

work page doi:10.4204/eptcs.432.6 2025
[7]

Nathan Foster, Michael B

J. Nathan Foster, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.ACM Transactions on Programming Languages and Systems29, 3 (2007), 17. doi:10.1145/1232420.1232424

work page doi:10.1145/1232420.1232424 2007
[8]

Zhoulai Fu, Zhaojun Bai, and Zhendong Su. 2015. Automated Backward Error Analysis for Numerical Code. InACM SIGPLAN Conference on Object Oriented Programming: Systems, Languages, and Applications (OOPSLA), Pittsburgh, Pennsylvania. Association for Computing Machinery, New York, NY, USA, 639–654. doi:10.1145/2814270.2814317

work page doi:10.1145/2814270.2814317 2015
[9]

Eric Goubault and Sylvie Putot. 2006. Static Analysis of Numerical Algorithms. InInternational Symposium on Static Analysis (SAS), Seoul, Korea. Springer-Verlag, Berlin, Heidelberg, 18–34. doi:10.1007/11823230_3

work page doi:10.1007/11823230_3 2006
[10]

Higham and Nicholas J

Desmond J. Higham and Nicholas J. Higham. 1992. Backward Error and Condition of Structured Linear Systems.SIAM J. Matrix Anal. Appl.13, 1 (1992), 162–175. doi:10.1137/0613014

work page doi:10.1137/0613014 1992
[11]

Nicholas J. Higham. 2002.Accuracy and Stability of Numerical Algorithms(second ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. doi:10.1137/1.9780898718027

work page doi:10.1137/1.9780898718027 2002
[12]

Martin Hofmann, Benjamin Pierce, and Daniel Wagner. 2012. Edit Lenses. InACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Philadelphia, Pennsylvania. Association for Computing Machinery, New York, NY, USA, 495–508. doi:10.1145/2103656.2103715

work page doi:10.1145/2103656.2103715 2012
[13]

Kellison and Justin Hsu

Ariel E. Kellison and Justin Hsu. 2024. Numerical Fuzz: A Type System for Rounding Error Analysis.Proceedings of the ACM on Programming Languages8, PLDI, Article 226 (2024), 25 pages. doi:10.1145/3656456

work page doi:10.1145/3656456 2024
[14]

Kellison, Laura Zielinski, David Bindel, and Justin Hsu

Ariel E. Kellison, Laura Zielinski, David Bindel, and Justin Hsu. 2025. Bean: A Language for Backward Error Analysis. Proceedings of the ACM on Programming Languages9, PLDI, Article 221 (2025), 25 pages. doi:10.1145/3729324 Synthesizing Backward Error Bounds, Backward 23

work page doi:10.1145/3729324 2025
[15]

Webb Miller and David Spooner. 1978. Algorithm 532: Software for Roundoff Analysis [Z].ACM Trans. Math. Software 4, 4 (1978), 388–390. doi:10.1145/356502.356497

work page doi:10.1145/356502.356497 1978
[16]

F. W. J. Olver. 1978. A New Approach to Error Arithmetic.SIAM J. Numer. Anal.15, 2 (1978), 368–393. doi:10.1137/ 0715024

1978
[17]

Wilcox, and Zachary Tatlock

Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. InACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Portland, Oregon. Association for Computing Machinery, New York, NY, USA, 1–11. doi:10.1145/2737924.2737959

work page doi:10.1145/2737924.2737959 2015
[18]

1990.Functional Stability Analysis of Numerical Algorithms

Thomas Harvey Rowan. 1990.Functional Stability Analysis of Numerical Algorithms. Ph. D. Dissertation. University of Texas at Austin, USA

1990
[19]

Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan

Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan
[20]

Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamarić, and Ganesh Gopalakrishnan

Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions.ACM Transactions on Programming Languages and Systems41, 1, Article 2 (2018), 39 pages. doi:10.1145/3230733

work page doi:10.1145/3230733 2018
[21]

Feliu, Paolo Masci, and César A

Laura Titolo, Mariano Moscato, Marco A. Feliu, Paolo Masci, and César A. Muñoz. 2024. Rigorous Floating-Point Round- Off Error Analysis in PRECiSA 4.0. InInternational Symposium on Formal Methods (FM), Milan, Italy. Springer-Verlag, Cham, Switzerland, 20–38. doi:10.1007/978-3-031-71177-0_2

work page doi:10.1007/978-3-031-71177-0_2 2024
[22]

2023.Rounding Errors in Algebraic Processes

James Hardy Wilkinson. 2023.Rounding Errors in Algebraic Processes. Society for Industrial and Applied Mathematics, Philadelphia, PA. doi:10.1137/1.9781611977523 Originally published 1963

work page doi:10.1137/1.9781611977523 2023
[23]

Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying Datalog and Equality Saturation.Proceedings of the ACM on Programming Languages7, PLDI, Article 125 (2023), 25 pages. doi:10.1145/3591239

work page doi:10.1145/3591239 2023
[24]

Mattukat, Vincent Schmandt, Langstrof Timo, Zerbe Michael, and Horst Lichter

Laura Zielinski and Justin Hsu. 2026.eggshel: A Floating-Point Backward Error Analysis Tool. doi:10.5281/zenodo. 19391349 24 Laura Zielinski and Justin Hsu A Backward Error Analysis By Hand We will now show an example of performing backward error analysis by hand. This is the type of analysis that our automation emulates. It is a well-known fact that the ...

work page doi:10.5281/zenodo 2026