pith. sign in

arxiv: 2605.28983 · v1 · pith:D7DOJLGPnew · submitted 2026-05-27 · 💻 cs.LG · cs.AI· math.DS· math.RT· physics.comp-ph

The Hamilton-Jacobi Theory of Deep Learning

classification 💻 cs.LG cs.AImath.DSmath.RTphysics.comp-ph
keywords hamilton--jacobivarepsilonarchitecturesattributiondataequationhamiltonianinitial
0
0 comments X
read the original abstract

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter $\varepsilon$ unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate $O(n^{-1/(d+2)})$ for fixed $t$; adversarial robustness controlled by $\varepsilon$; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form $O(N)$ influence function (softmax attribution weights $\pi_j$) whose entropy landscape undergoes fold bifurcations as $\varepsilon$ increases, each merging attribution basins.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.