pith. sign in

arxiv: 2101.06757 · v9 · submitted 2021-01-17 · 💻 cs.PL · cs.LO

Higher Order Automatic Differentiation of Higher Order Functions

Pith reviewed 2026-05-24 14:12 UTC · model grok-4.3

classification 💻 cs.PL cs.LO
keywords automatic differentiationhigher-order functionsdiffeological spacessemantic correctnessgluing constructionforward-mode ADalgebraic data typesTaylor approximation
0
0 comments X

The pith

Forward-mode automatic differentiation on higher-order languages is the unique structure-preserving macro and is correct by a gluing construction on diffeological spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a forward-mode AD transformation on a higher-order functional language with algebraic data types is semantically correct. It first characterises the transformation as the only way to extend given derivatives of basic operations while preserving the language's structure. A semantics based on diffeological spaces is then used to interpret both the original language and the transformed programs. Correctness follows from a gluing construction that links the two interpretations, shown to be equivalent to a logical-relations argument. The same technique is shown to cover the computation of higher-order derivatives via Taylor expansions.

Core claim

The forward-mode AD method is the unique structure-preserving macro extending a choice of derivatives for primitives; this uniqueness yields a correctness proof relative to the diffeological-space semantics of the language via an explicit gluing construction that equates the AD-transformed term with the semantic derivative, and the argument extends directly to higher-order derivatives obtained by Taylor approximation.

What carries the argument

The gluing construction on diffeological spaces that realises the logical relation between a term and its AD image.

If this is right

  • Derivatives computed by the macro match the semantic derivative for every well-typed program in the language.
  • The same uniqueness argument supplies correctness for the Taylor-based computation of second- and higher-order derivatives.
  • Any other AD method that also preserves the language structure must coincide with this one.
  • The proof technique is independent of the concrete choice of derivatives for the basic operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The gluing technique may adapt to prove correctness of reverse-mode or mixed-mode AD on the same language.
  • The diffeological model could serve as a target for other program analyses that must respect differentiation.
  • Similar uniqueness-plus-gluing arguments might apply to languages that add recursion or probabilistic choice.

Load-bearing premise

Diffeological spaces give a faithful semantics for the higher-order language with algebraic data types, and the AD macro is the only structure-preserving extension of the chosen basic derivatives.

What would settle it

A concrete higher-order program (for example a function that takes a function as argument) whose AD image, when interpreted in the diffeological semantics, fails to equal the semantic derivative of the original program.

Figures

Figures reproduced from arXiv: 2101.06757 by Mathieu Huot, Matthijs V\'ak\'ar, Sam Staton.

Figure 1
Figure 1. Figure 1: Overview of semantics/correctness of AD. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The network in (1.1) with k inputs and two hidden layers. Here P ∼= realp with p = (m(k+1)+n(m+1)+n+1). This program (1.1) describes a smooth (infinitely differentiable) function. The goal of automatic differentiation is to find its derivative. If we β-reduce all the λ’s, we end up with a very long function expression just built from the sigmoid function and linear algebra. We can then find a program for c… view at source ↗
Figure 3
Figure 3. Figure 3: Typing rules for the simple language [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Additional typing rules for the extended language. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Standard βη-laws (e.g. [Pit95]) for products, functions, variants and lists [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
read the original abstract

We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher-order language with algebraic data types and we characterise it as the unique structure-preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming based on diffeological spaces. We show that it interprets our language and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is in essence a logical relations argument. Throughout we show how the analysis extends to AD methods for computing higher-order derivatives using a Taylor approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents semantic correctness proofs for a forward-mode automatic differentiation (AD) method on a higher-order functional language with algebraic data types. It characterizes the AD transformation as the unique structure-preserving macro given derivatives for basic operations, develops a semantics based on diffeological spaces that interprets the language, and proves correctness of the AD method via a gluing construction on these spaces (equivalent to a logical relations argument). The analysis is shown to extend to higher-order derivatives through Taylor approximation.

Significance. If the central claims hold, the work provides an elegant and rigorous semantic foundation for the correctness of AD in higher-order settings with ADTs. The characterization as a unique structure-preserving macro and the use of a gluing construction for the proof are notable strengths that could influence semantic models for differentiable programming. The extension to higher-order derivatives adds breadth, though the primary contribution is the semantic proof technique.

minor comments (3)
  1. The introduction would benefit from a small concrete example of the AD macro applied to a higher-order function involving an algebraic data type to ground the abstract claims before the semantic development.
  2. Section headings and subsection numbering for the gluing construction and its relation to logical relations could be made more explicit to help readers trace the proof structure.
  3. The notation used for the Taylor approximation when extending to higher-order derivatives should be cross-referenced more clearly to the basic first-order case.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, recognition of the semantic contributions, and recommendation to accept the paper.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via external semantics

full rationale

The paper characterises forward-mode AD as the unique structure-preserving macro given derivatives for basic operations and proves correctness via a gluing construction (logical relations) on diffeological spaces. This relies on external mathematical structures (diffeological spaces semantics and gluing) that interpret the higher-order language with ADTs, rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract and reader's assessment indicate the model and construction are exhibited directly, with no reduction of the central claim to its own inputs by construction. The extension to higher-order derivatives via Taylor approximation follows the same external framework.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that diffeological spaces supply an appropriate semantics for the language and that the gluing construction correctly establishes equivalence to the mathematical derivative; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Diffeological spaces provide a rich semantics for differentiable programming that interprets the higher-order language with algebraic data types.
    Invoked to give meaning to programs and to phrase what correctness of AD means.

pith-pipeline@v0.9.0 · 5649 in / 1251 out tokens · 30861 ms · 2026-05-24T14:12:01.399939+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 6 internal anchors

  1. [1]

    A Geometric Theory of Higher-Order Automatic Differentiation

    To appear. [Bet18] Michael Betancourt. A geometric theory of higher-order automatic differentiation. arXiv preprint arXiv:1812.11592,

  2. [2]

    Differentiating a tensor language

    [BML+20] Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, and Jonathan Ragan-Kelley. Differentiating a tensor language. arXiv preprint arXiv:2008.11256 ,

  3. [3]

    Backpropagation in the simply typed lambda-calculus with linear negation

    [BMP20] Alois Brunel, Damiano Mazza, and Michele Pagani. Backpropagation in the simply typed lambda-calculus with linear negation. In Proc. POPL 2020,

  4. [4]

    Fadbad, a flexible C++ package for automatic differentiation

    [BS96] Claus Bendtsen and Ole Stauning. Fadbad, a flexible C++ package for automatic differentiation. Technical report, Technical Report IMM–REP–1996–17, Department of Mathematical Modelling, Technical University of Denmark, Lyngby,

  5. [5]

    Tadiff, a flexible c++ package for automatic differentiation

    [BS97] Claus Bendtsen and Ole Stauning. Tadiff, a flexible c++ package for automatic differentiation. TU of Denmark, Department of Mathematical Modelling, Lungby. Technical report IMM-REP- 1997-07,

  6. [6]

    [CCG+20] J. Robin B. Cockett, Geoff S. H. Cruttwell, Jonathan Gallagher, Jean-Simon Pacaud Lemay, Benjamin MacAdam, Gordon D. Plotkin, and Dorette Pronk. Reverse derivative categories. In Proc. CSL 2020,

  7. [7]

    Towards formalizing and extending differential programming using tangent categories

    [CGM19] Geoff Cruttwell, Jonathan Gallagher, and Ben MacAdam. Towards formalizing and extending differential programming using tangent categories. In Proc. ACT 2019,

  8. [8]

    The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

    [CHB+15] Bob Carpenter, Matthew D Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. The Stan math library: Reverse-mode automatic differentiation in C++. arXiv preprint arXiv:1509.07164,

  9. [9]

    Tangent spaces and tangent bundles for diffeological spaces

    Vol. 18:1 HIGHER ORDER AUTOMATIC DIFFERENTIATION OF HIGHER ORDER FUNCTIONS 41:31 [CW14] J Daniel Christensen and Enxin Wu. Tangent spaces and tangent bundles for diffeological spaces. arXiv preprint arXiv:1411.5425 ,

  10. [10]

    Backprop as functor: A compositional perspective on supervised learning

    [FST19] Brendan Fong, David Spivak, and R´ emy Tuy´ eras. Backprop as functor: A compositional perspective on supervised learning. In 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) , pages 1–13. IEEE,

  11. [11]

    [IZ13] Patrick Iglesias-Zemmour

    arxiv:2001.02209. [IZ13] Patrick Iglesias-Zemmour. Diffeology. American Mathematical Soc.,

  12. [12]

    Quasitoposes, quasiadhesive categories and Artin glueing

    [JLS07] Peter T Johnstone, Stephen Lack, and P Sobocinski. Quasitoposes, quasiadhesive categories and Artin glueing. In Proc. CALCO 2007,

  13. [13]

    Adam: A Method for Stochastic Optimization

    [KB14] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

  14. [14]

    A simply typed λ-calculus of forward automatic differentiation

    [Man12] Oleksandr Manzyuk. A simply typed λ-calculus of forward automatic differentiation. In Proc. MFPS 2012,

  15. [15]

    Four explicit formulas for the prolongations of an infinitesimal Lie symmetry and multivariate Faa di Bruno formulas

    [Mer04] Joel Merker. Four explicit formulas for the prolongations of an infinitesimal lie symmetry and multivariate Faa di Bruno formulas. arXiv preprint math/0411650 ,

  16. [16]

    A differential-form pullback programming language for higher-order reverse-mode automatic differentiation

    [MO20] Carol Mak and Luke Ong. A differential-form pullback programming language for higher-order reverse-mode automatic differentiation. arxiv:2002.08241,

  17. [17]

    [MS92] John C Mitchell and Andre Scedrov

    doi:10.1145/3434309. [MS92] John C Mitchell and Andre Scedrov. Notes on sconing and relators. In International Workshop on Computer Science Logic , pages 352–378. Springer,

  18. [18]

    Some principles of differential programming languages

    [Plo18] Gordon D Plotkin. Some principles of differential programming languages. Invited talk, POPL 2018,

  19. [19]

    λS: Computable semantics for differen- tiable programming with higher-order functions and datatypes

    [SMC20] Benjamin Sherman, Jesse Michel, and Michael Carbin. λS: Computable semantics for differen- tiable programming with higher-order functions and datatypes. arXiv preprint arXiv:2007.08017,

  20. [20]

    Denotational correctness of foward-mode automatic differentiation for iteration and recursion

    [V´ ak20] Matthijs V´ ak´ ar. Denotational correctness of foward-mode automatic differentiation for iteration and recursion. arXiv preprint arXiv:2007.05282 ,

  21. [21]

    CHAD: Combinatory Homomorphic Automatic Differentiation

    Vol. 18:1 HIGHER ORDER AUTOMATIC DIFFERENTIATION OF HIGHER ORDER FUNCTIONS 41:33 [VS21] Matthijs V´ ak´ ar and Tom Smeding. CHAD: Combinatory homomorphic automatic differentiation. arXiv preprint arXiv:2103.15776 ,

  22. [22]

    On the principles of differentiable quantum programming languages

    [ZHCW20] Shaopeng Zhu, Shih-Han Hung, Shouvanik Chakrabarti, and Xiaodi Wu. On the principles of differentiable quantum programming languages. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020 , pages 272–285. ACM,

  23. [23]

    doi:10.1145/3385412.3386011. 41:34 M. Huot, S. Staton, and M. V ´ak´ar Vol. 18:1 Appendix A. CartSp and Man are not cartesian closed categories Lemma A.1. There is no continuous injection Rd+1→ Rd. Proof. If there were, it would restrict to a continuous injection Sd→ Rd. The Borsuk-Ulam theorem, however, tells us that every continuous f : Sd→ Rd has some ...