arxiv: 2605.12301 · v1 · submitted 2026-05-12 · 💻 cs.LG · math.ST· stat.TH

Recognition: no theorem link

Approximation of Maximally Monotone Operators : A Graph Convergence Perspective

Takashi Furuya , Yury Korolev , Takaharu Yaguchi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:15 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH

keywords maximally monotone operatorsgraph convergenceoperator approximationencoder-decoder architecturesresolvent parameterizationset-valued operatorsoperator learning

0 comments

The pith

Any maximally monotone operator can be approximated in local graph convergence by continuous encoder-decoder architectures while preserving maximal monotonicity via resolvent parameterizations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional uniform and L^p approximations fail for discontinuous or set-valued operators such as differential operators. The paper instead uses local graph convergence as the appropriate notion for closed operators. It proves that continuous encoder-decoder architectures can achieve local graph convergence to any maximally monotone operator. It further constructs approximations that retain maximal monotonicity by using resolvent-based parameterizations.

Core claim

The paper shows that every maximally monotone operator admits approximations in the sense of local graph convergence by continuous encoder-decoder architectures. It additionally constructs structure-preserving versions of these approximations that remain maximally monotone through resolvent-based parameterizations.

What carries the argument

Local graph convergence (Painlevé-Kuratowski sense) of continuous encoder-decoder architectures, with resolvent-based parameterizations to enforce maximal monotonicity.

If this is right

Uniform and L^p approximations are inadequate for closed operators.
Continuous encoder-decoder architectures suffice for local graph convergence approximations of all maximally monotone operators.
Resolvent-based constructions yield approximating operators that remain maximally monotone.
Operator learning becomes feasible for discontinuous and set-valued maps outside classical continuous frameworks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could support learning of solution maps for variational inequalities or optimization problems involving monotone operators.
Numerical schemes for physical systems governed by such operators might gain stability from these approximations.
Similar graph-convergence ideas could be tested on other classes of closed operators beyond the monotone case.

Load-bearing premise

That local graph convergence is the appropriate notion for practical approximation of maximally monotone operators.

What would settle it

A concrete maximally monotone operator together with a proof that no continuous encoder-decoder sequence achieves local graph convergence to it, or that the resolvent parameterization fails to preserve monotonicity.

Figures

Figures reproduced from arXiv: 2605.12301 by Takaharu Yaguchi, Takashi Furuya, Yury Korolev.

**Figure 1.** Figure 1: High-frequency input u(t) (left) and its derivative u ′ (t) (right). Here u(t) is generated as in (6) with K = 1, n = 6, aj = 1, bj = 0, and β = 0.5 [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗

**Figure 2.** Figure 2: High-frequency input u(x, y) (left) and the corresponding nonlinear p-Laplacian −div(|∇u| p−2∇u) with p = 1.2 (right). Here u(x, y) is generated as in (7) with K = 1, n = 9, aj = 1, bj = 1, and β = 0. max j∈[Ntrain] ∥Ak(uj )−Aˆ(uj )∥L2 ≈ τ∞ log   N Xtrain j=1 exp ∥Ak(uj ) − Aˆ(uj )∥L2 τ∞ !  (soft ℓ ∞ loss), (9) and a graph-distance-based loss defined via the soft approximation maxn sup i∈[Ntrain] inf j… view at source ↗

read the original abstract

Operator learning has been highly successful for continuous mappings between infinite-dimensional spaces, such as PDE solution operators. However, many operators of interest-including differential operators-are discontinuous or set-valued, and lie outside classical approximation frameworks. We propose a paradigm shift by formulating approximation via graph convergence (Painlev\'e-Kuratowski convergence), which is well-suited for closed operators. We show that uniform and $L^p$ approximation are fundamentally inadequate in this setting. Focusing on maximally monotone operators, we prove that any such operator can be approximated in the sense of local graph convergence by continuous encoder-decoder architectures, and further construct structure-preserving approximations that retain maximal monotonicity via resolvent-based parameterizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows any maximally monotone operator can be approximated locally in the graph sense by continuous encoder-decoder maps and gives a resolvent construction that keeps monotonicity.

read the letter

The main point is that uniform or L^p norms do not work for closed set-valued operators, so the authors switch to local graph convergence and prove that continuous neural architectures can still approximate any maximally monotone operator in that sense. They also give an explicit resolvent-based parameterization that preserves maximal monotonicity. That is the actual new piece: the formulation and the structure-preserving construction rest on standard facts about resolvents and Painlevé-Kuratowski convergence rather than on new operator-learning tricks. The abstract is internally consistent on why the usual norms fail and why graph convergence is the right replacement. The existence claim and the resolvent route look like they follow from classical monotone-operator theory, so the math is on solid ground where it is stated. The soft spots are that everything stays at the existence level with no rates, no discretization details, and no numerical checks visible. It is not obvious how the local graph approximation behaves once you discretize or train on data, and the encoder-decoder step is asserted without showing how the graph is actually encoded. The assumption that the target is maximally monotone is taken as given, which limits the scope. This is for people already working on operator learning who want to move past continuous maps into variational problems. A reader who cares about the theoretical extension will find the perspective useful; someone looking for ready-to-use algorithms will not. It deserves a serious referee because the core claim is new, the reasoning is coherent, and the gaps are fixable with more concrete work rather than fatal.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a framework for approximating maximally monotone operators using graph convergence (Painlevé-Kuratowski) instead of uniform or L^p norms, which are shown to be inadequate for closed set-valued operators. It proves existence of local graph-convergence approximations by continuous encoder-decoder architectures and provides explicit resolvent-based constructions that preserve maximal monotonicity.

Significance. If the central existence results and constructions hold, the work supplies a theoretically grounded extension of operator learning to discontinuous and set-valued operators arising in PDEs and optimization. The explicit resolvent parameterizations and emphasis on structure preservation are concrete strengths that could support downstream numerical work.

major comments (2)

[Section 4] The local graph convergence result (likely Theorem 3.1 or 4.2) establishes approximation but does not address whether the approximating operators converge in the sense of resolvents or Yosida regularizations; this is load-bearing for applications to proximal algorithms and should be stated explicitly or shown via an additional corollary.
[Theorem 5.3] The encoder-decoder construction appears to rely on density arguments in the graph topology, yet the manuscript does not quantify the modulus of continuity or the dimension of the latent space needed to achieve a prescribed graph-convergence tolerance; without such control the existence statement remains non-constructive for practical purposes.

minor comments (3)

[Abstract and §2] The abstract and introduction use 'local graph convergence' without a self-contained definition or pointer to the precise metric; add a short paragraph in §2.
[Throughout] Notation for the graph G(A) and the resolvent J_λ is introduced inconsistently across sections; standardize and include a notation table.
[Figure 1] Figure 1 caption should clarify whether the plotted sets are exact graphs or numerical approximations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the detailed comments, which have helped us strengthen the manuscript. We address each major comment below.

read point-by-point responses

Referee: [Section 4] The local graph convergence result (likely Theorem 3.1 or 4.2) establishes approximation but does not address whether the approximating operators converge in the sense of resolvents or Yosida regularizations; this is load-bearing for applications to proximal algorithms and should be stated explicitly or shown via an additional corollary.

Authors: We agree that explicit resolvent convergence is important for proximal algorithms. The structure-preserving constructions already produce maximally monotone operators, and local graph convergence of maximal monotone operators implies resolvent convergence in the strong topology. We have added Corollary 4.3, which states this implication with a short proof using the Minty parametrization of the graphs. revision: yes
Referee: [Theorem 5.3] The encoder-decoder construction appears to rely on density arguments in the graph topology, yet the manuscript does not quantify the modulus of continuity or the dimension of the latent space needed to achieve a prescribed graph-convergence tolerance; without such control the existence statement remains non-constructive for practical purposes.

Authors: The referee correctly observes that the proof of Theorem 5.3 is existential via density and supplies no explicit modulus or latent-dimension bound. The paper's focus is the theoretical existence result rather than quantitative rates, which would require additional regularity assumptions on the operator. We have inserted a remark after Theorem 5.3 that acknowledges the non-constructive character and indicates how the latent dimension may be chosen in practice by appealing to known approximation rates for continuous functions on compact sets. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper establishes an existence result: any maximally monotone operator admits local graph-convergence approximation by continuous encoder-decoder maps, together with an explicit resolvent-based construction that preserves maximal monotonicity. These claims rest on standard properties of monotone operators, resolvents, and Painlevé-Kuratowski convergence in reflexive Banach spaces. No step reduces by definition to its own inputs, no parameter is fitted on a subset and then relabeled as a prediction, and no load-bearing premise is justified solely by self-citation. The argument that uniform and L^p notions are inadequate for closed set-valued operators follows directly from the definition of those convergences and is internally consistent without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the definition and properties of maximally monotone operators and the Painlevé-Kuratowski graph convergence, both taken from prior literature. No new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Maximally monotone operators are closed and satisfy the standard monotonicity inequality.
Invoked when stating that any such operator can be approximated.
domain assumption Local graph convergence is a suitable notion of approximation for discontinuous operators.
Central to the paradigm shift claimed in the abstract.

pith-pipeline@v0.9.0 · 5413 in / 1232 out tokens · 32500 ms · 2026-05-13T07:15:13.079220+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Lecture Notes in Applied and Computational Mechanics

Vincent Acary and Bernard Brogliato.Numerical methods for nonsmooth dynamical systems: Applications in mechanics and electronics. Lecture Notes in Applied and Computational Mechanics. Springer, Berlin, Germany, 2008

work page 2008
[2]

Preservation or not of the maximally monotone property by graph-convergence

Samir Adly, Hédy Attouch, and Ralph Tyrrell Rockafellar. Preservation or not of the maximally monotone property by graph-convergence. 2022

work page 2022
[3]

Sorting out lipschitz function approximation

Cem Anil, James Lucas, and Roger Grosse. Sorting out lipschitz function approximation. In International conference on machine learning, pages 291–301. PMLR, 2019

work page 2019
[4]

A tour of the theory of absolutely minimizing functions.Bulletin of the American mathematical society, 41(4):439–505, 2004

Gunnar Aronsson, Michael Crandall, and Petri Juutinen. A tour of the theory of absolutely minimizing functions.Bulletin of the American mathematical society, 41(4):439–505, 2004

work page 2004
[5]

Heinz H Bauschke, Xianfu Wang, and Liangjin Yao. Examples of discontinuous maximal monotone linear operators and the solution to a recent problem posed by bf svaiter.Journal of Mathematical Analysis and Applications, 370(1):224–241, 2010

work page 2010
[6]

Convex analysis and monotone operator theory in hilbert spaces, 2011

HH Bauschke. Convex analysis and monotone operator theory in hilbert spaces, 2011

work page 2011
[7]

Learning truly monotone operators with applications to nonlinear inverse problems.SIAM Journal on Imaging Sciences, 18(1):735–764, 2025

Younes Belkouchi, Jean-Christophe Pesquet, Audrey Repetti, and Hugues Talbot. Learning truly monotone operators with applications to nonlinear inverse problems.SIAM Journal on Imaging Sciences, 18(1):735–764, 2025

work page 2025
[8]

Differential equations with maximal monotone operators.Journal of Mathematical Analysis and Applications, 539(1):128484, 2024

Irene Benedetti, Luisa Malaguti, and Manuel DP Monteiro Marques. Differential equations with maximal monotone operators.Journal of Mathematical Analysis and Applications, 539(1):128484, 2024

work page 2024
[9]

Modern regularization methods for inverse problems.Acta numerica, 27:1–111, 2018

Martin Benning and Martin Burger. Modern regularization methods for inverse problems.Acta numerica, 27:1–111, 2018

work page 2018
[10]

Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

Kaushik Bhattacharya, Bamdad Hosseini, Nikola B Kovachki, and Andrew M Stuart. Model reduction and neural networks for parametric pdes.The SMAI journal of computational mathematics, 7:121–157, 2021

work page 2021
[11]

DeepMoD: Deep learning for model discovery in noisy data.J

Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. DeepMoD: Deep learning for model discovery in noisy data.J. Comput. Phys., 428(109985):109985, March 2021

work page 2021
[12]

A mathematical guide to operator learning

Nicolas Boullé and Alex Townsend. A mathematical guide to operator learning. InHandbook of Numerical Analysis, volume 25, pages 83–125. Elsevier, 2024

work page 2024
[13]

Learning firmly nonex- pansive operators.arXiv:2407.14156, 2024

Kristian Bredies, Jonathan Chirinos-Rodriguez, and Emanuele Naldi. Learning firmly nonex- pansive operators.arXiv:2407.14156, 2024

work page arXiv 2024
[14]

Elsevier, 1973

Haim Brezis.Operateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert, volume 5. Elsevier, 1973

work page 1973
[15]

Asymptotic profiles of nonlinear homogeneous evolution equations of gradient flow type.Journal of Evolution Equations, 20(3):1061–1092, 2020

Leon Bungert and Martin Burger. Asymptotic profiles of nonlinear homogeneous evolution equations of gradient flow type.Journal of Evolution Equations, 20(3):1061–1092, 2020

work page 2020
[16]

Gradient flows and nonlinear power methods for the com- putation of nonlinear eigenfunctions

Leon Bungert and Martin Burger. Gradient flows and nonlinear power methods for the com- putation of nonlinear eigenfunctions. InHandbook of numerical analysis, volume 23, pages 427–465. Elsevier, 2022

work page 2022
[17]

Eigenvalue problems inL∞: optimality conditions, duality, and relations with optimal transport.Communications of the American Mathematical Society, 2(08):345–373, 2022

Leon Bungert and Yury Korolev. Eigenvalue problems inL∞: optimality conditions, duality, and relations with optimal transport.Communications of the American Mathematical Society, 2(08):345–373, 2022

work page 2022
[18]

Introduction to nonlinear spectral analysis.arXiv:2506.08754, 2025

Leon Bungert and Yury Korolev. Introduction to nonlinear spectral analysis.arXiv:2506.08754, 2025. 10

work page arXiv 2025
[19]

Physics-informed learning of governing equations from scarce data.Nat

Zhao Chen, Yang Liu, and Hao Sun. Physics-informed learning of governing equations from scarce data.Nat. Commun., 12(1):6136, October 2021

work page 2021
[20]

G-convergence of monotone operators

Valeria Chiado’Piat, Gianni Dal Maso, and Anneliese Defranceschi. G-convergence of monotone operators. InAnnales de l’Institut Henri Poincaré C, Analyse non linéaire, volume 7, pages 123–160. Elsevier, 1990

work page 1990
[21]

Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems, 2(4):303–314, 1989

work page 1989
[22]

Springer Science & Business Media, 1996

Heinz Werner Engl, Martin Hanke, and Andreas Neubauer.Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996

work page 1996
[23]

New universal operator approximation theorem for encoder- decoder architectures (preprint).arXiv:2503.24092, 2025

Janek Gödeke and Pascal Fernsel. New universal operator approximation theorem for encoder- decoder architectures (preprint).arXiv:2503.24092, 2025

work page arXiv 2025
[24]

Approximating continuous functions by relu nets of minimal width.arXiv:1710.11278, 2017

Boris Hanin and Mark Sellke. Approximating continuous functions by relu nets of minimal width.arXiv:1710.11278, 2017

work page arXiv 2017
[25]

Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

work page 1989
[26]

CFO: Learning continuous-time PDE dynamics via flow-matched neural operators

Xianglong Hou, Xinquan Huang, and Paris Perdikaris. CFO: Learning continuous-time PDE dynamics via flow-matched neural operators. InThe Fourteenth International Conference on Learning Representations, 2025

work page 2025
[27]

Institut Mittag-Leffler, 1999

Petri Juutinen, Peter Lindqvist, and Juan J Manfredi.The infinity Laplacian: examples and observations. Institut Mittag-Leffler, 1999

work page 1999
[28]

Two-layer neural networks with values in a banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

Yury Korolev. Two-layer neural networks with values in a banach space.SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022

work page 2022
[29]

On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra. On universal approximation and error bounds for fourier neural operators.Journal of Machine Learning Research, 22(290):1–76, 2021

work page 2021
[30]

Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Research, 24(89):1–97, 2023

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes.Journal of Machine Learning Research, 24(89):1–97, 2023

work page 2023
[31]

Operator learning: Algorithms and analysis.Handbook of Numerical Analysis, 25:419–467, 2024

Nikola B Kovachki, Samuel Lanthaler, and Andrew M Stuart. Operator learning: Algorithms and analysis.Handbook of Numerical Analysis, 25:419–467, 2024

work page 2024
[32]

Nonlocality and nonlinearity implies universality in operator learning.Constructive Approximation, 62(2):261–303, 2025

Samuel Lanthaler, Zongyi Li, and Andrew M Stuart. Nonlocality and nonlinearity implies universality in operator learning.Constructive Approximation, 62(2):261–303, 2025

work page 2025
[33]

Error estimates for deeponets: A deep learning framework in infinite dimensions.Transactions of Mathematics and its Applications, 6(1):tnac001, 2022

Samuel Lanthaler, Siddhartha Mishra, and George E Karniadakis. Error estimates for deeponets: A deep learning framework in infinite dimensions.Transactions of Mathematics and its Applications, 6(1):tnac001, 2022

work page 2022
[34]

PDE-net: Learning PDEs from data

Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data. InInternational Conference on Machine Learning, pages 3208–3216. PMLR, July 2018

work page 2018
[35]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv:1910.03193, 2019

work page internal anchor Pith review arXiv 1910
[36]

Spectral Normalization for Generative Adversarial Networks

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks.arXiv:1802.05957, 2018

work page Pith review arXiv 2018
[37]

Approximation theory for 1-lipschitz resnets.arXiv:2505.12003, 2025

Davide Murari, Takashi Furuya, and Carola-Bibiane Schönlieb. Approximation theory for 1-lipschitz resnets.arXiv:2505.12003, 2025

work page arXiv 2025
[38]

Nelsen and Yunan Yang

Nicholas H. Nelsen and Yunan Yang. Operator learning meets inverse problems: A probabilistic perspective.arxiv:2508.20207, 2025. 11

work page arXiv 2025
[39]

Approximation of lipschitz functions using deep spline neural networks.SIAM Journal on Mathematics of Data Science, 5(2):306–322, 2023

Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, and Michael Unser. Approximation of lipschitz functions using deep spline neural networks.SIAM Journal on Mathematics of Data Science, 5(2):306–322, 2023

work page 2023
[40]

Learning maxi- mally monotone operators for image recovery.SIAM Journal on Imaging Sciences, 14(3):1206– 1237, 2021

Jean-Christophe Pesquet, Audrey Repetti, Matthieu Terris, and Yves Wiaux. Learning maxi- mally monotone operators for image recovery.SIAM Journal on Imaging Sciences, 14(3):1206– 1237, 2021

work page 2021
[41]

Deep hidden physics models: deep learning of nonlinear partial differential equations.J

Maziar Raissi. Deep hidden physics models: deep learning of nonlinear partial differential equations.J. Mach. Learn. Res., 19(1):932–955, 2018

work page 2018
[42]

Springer, 1998

R Tyrrell Rockafellar and Roger JB Wets.Variational analysis. Springer, 1998

work page 1998
[43]

Variational methods in imaging, volume 167

Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, and Frank Lenzen. Variational methods in imaging, volume 167. Springer

work page
[44]

Finite element analysis of the duct flow of bingham plastic fluids: an application of the variational inequality.Int

Yeh Wang. Finite element analysis of the duct flow of bingham plastic fluids: an application of the variational inequality.Int. J. Numer. Methods Fluids, 25(9):1025–1042, 1997

work page 1997
[45]

Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017

Dmitry Yarotsky. Error bounds for approximations with deep relu networks.Neural networks, 94:103–114, 2017. 12 A Proofs A.1 Proof of Example 2 Proof.Suppose, for contradiction, there is a continuousA ε :D(A ε)⊂H→HwithK⊂D(A ε) sup u∈K ∥Aε(u)−A(u)∥ L2(0,1) ≤ε. In particular, ∥Aε(0)∥L2(0,1) =∥A ε(0)−A(0)∥ L2(0,1) ≤ε. We observe that since∥Av n∥L2(0,1) = π√ 2...

work page arXiv 2017