Multiple Neural Operators Achieve Near-Optimal Rates for Multi-Task Learning
Pith reviewed 2026-05-22 07:28 UTC · model grok-4.3
The pith
Collections of Lipschitz operator maps can be learned jointly with multiple neural operators at near-optimal rates that match single-task learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For broad classes of Lipschitz multiple operator maps the Multiple Neural Operators architecture delivers near-optimal upper bounds on approximation error and statistical generalization; matching lower bounds prove minimax rates that exhibit a curse of parametric complexity; together these results establish that joint learning of multiple operators incurs no extra cost beyond single-operator learning and therefore follows identical scaling laws.
What carries the argument
The Multiple Neural Operators (MNO) architecture, which learns collections of operators through shared representations while respecting the Lipschitz condition on the joint map.
If this is right
- Multi-task operator learning achieves the same near-optimal approximation rates as single-task operator learning.
- Statistical generalization bounds remain near-optimal and identical in scaling to the single-task case.
- Shared representations across tasks produce no increase in overall parametric complexity.
- The MNO architecture and the multi-task DeepONet extension satisfy essentially the same asymptotic rates from a worst-case perspective.
Where Pith is reading between the lines
- The result suggests that joint training on several related simulation tasks could be performed with sample complexity no higher than for one task alone.
- If real-world operator collections satisfy the Lipschitz condition, the bounds would justify multi-task training pipelines in scientific machine-learning applications.
- The equivalence of rates invites direct empirical checks on concrete operator-learning benchmarks such as fluid flow or elasticity maps.
Load-bearing premise
The collections of target operators must belong to the broad classes of Lipschitz multiple operator maps.
What would settle it
A measured scaling of approximation or generalization error that grows strictly faster with the number of tasks than the single-task rate would contradict the claimed equivalence.
read the original abstract
We study the approximation and statistical complexity of learning collections of operators in a shared multi-task setting, with a focus on the Multiple Neural Operators (MNO) architecture. For broad classes of Lipschitz multiple operator maps, we derive near-optimal upper bounds for approximation and statistical generalization. On the lower-bound side, we establish a curse of parametric complexity and prove corresponding minimax rates. Together, these results show that shared representations across tasks do not increase the overall cost: multi-task operator learning follows the same scaling laws as single operator learning. We also compare MNO with a multi-task extension of DeepONet based on concatenated task inputs and show that, from a worst-case approximation-complexity perspective, both architectures satisfy essentially the same asymptotic rates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the approximation and statistical complexity of multi-task operator learning with a focus on the Multiple Neural Operators (MNO) architecture. For broad classes of Lipschitz multiple operator maps, it derives near-optimal upper bounds on approximation and generalization error. It establishes matching minimax lower bounds that exhibit a curse of parametric complexity, showing that shared representations across tasks incur no extra cost and that multi-task operator learning obeys the same scaling laws as the single-task case. The paper also compares MNO to a concatenated-input multi-task extension of DeepONet and concludes that both architectures achieve essentially the same asymptotic rates from a worst-case perspective.
Significance. If the upper and lower bounds are correctly derived, the result would be significant for the theory of neural operators and multi-task learning. It supplies a rigorous justification that joint learning of operator collections does not inflate sample or parameter complexity beyond the single-operator baseline, which could inform architecture design in scientific machine learning applications involving families of related PDEs or dynamical systems. The explicit comparison to a multi-task DeepONet variant adds practical value by identifying two architectures with comparable worst-case guarantees.
minor comments (3)
- The abstract and introduction use the phrase 'near-optimal' without immediately stating the precise rate (e.g., the dependence on the number of tasks, the Lipschitz constant, or the dimension of the input function space). Adding a one-sentence summary of the achieved rate would improve readability.
- Section 2 (or the related-work subsection) would benefit from an explicit citation to the single-task operator-learning rates that the multi-task bounds are claimed to match, so that the 'same scaling laws' statement can be checked at a glance.
- In the statement of the main upper-bound theorem, the dependence of the constant on the number of tasks T should be written out explicitly rather than absorbed into big-O notation, to make the 'no extra cost' claim immediately verifiable.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We appreciate the recognition of the significance of our results on near-optimal rates for multi-task operator learning and the comparison to the multi-task DeepONet extension. We will incorporate any minor suggestions to improve clarity and presentation in the revised manuscript.
Circularity Check
No significant circularity detected
full rationale
The paper derives near-optimal approximation and statistical upper bounds for collections of Lipschitz multiple operator maps, along with matching minimax lower bounds that establish shared representations incur no extra cost relative to single-task operator learning. These results rely on standard approximation theory and statistical learning arguments conditioned explicitly on the Lipschitz multiple-operator class, without reducing any claimed prediction or rate to a fitted parameter, self-defined quantity, or load-bearing self-citation. The comparison to concatenated-input multi-task DeepONet is likewise framed in terms of asymptotic rates under the same class, with no step that equates an output to its own input by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Collections of operators belong to broad classes of Lipschitz multiple operator maps
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For broad classes of Lipschitz multiple operator maps, we derive near-optimal upper bounds for approximation and statistical generalization... multi-task operator learning follows the same scaling laws as single operator learning.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We extend the lower-complexity framework of [26] to the multiple operator setting... curse of parametric complexity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The sample complexity of learning lipschitz operators with respect to gaussian measures, 2025
Ben Adcock, Michael Griebel, and Gregor Maier. The sample complexity of learning lipschitz operators with respect to gaussian measures, 2025
work page 2025
-
[2]
Aras Bacho, Aleksei G. Sorokin, Xianjin Yang, Théo Bourdais, Edoardo Calvello, Matthieu Darcy, Alexander Hsu, Bamdad Hosseini, and Houman Owhadi. Operator learning at machine precision, 2025
work page 2025
-
[3]
Kaushik Bhattacharya, Bamdad Hosseini, Nikola B. Kovachki, and Andrew M. Stuart. Model Reduction And Neural Networks For Parametric PDEs.The SMAI Journal of computational mathematics, 7:121– 157, 2021
work page 2021
-
[4]
Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley Osher. Vicon: Vision in- context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024
-
[5]
The kolmogorov infinite dimensional equation in a hilbert space via deep learning methods
Javier Castro. The kolmogorov infinite dimensional equation in a hilbert space via deep learning methods. Journal of Mathematical Analysis and Applications, 527(2):127413, 2023
work page 2023
-
[6]
The calderón’s problem via deeponets.Vietnam Journal of Mathematics, 52(3):775–806, 2024
Javier Castro, Claudio Muñoz, and Nicolás Valenzuela. The calderón’s problem via deeponets.Vietnam Journal of Mathematics, 52(3):775–806, 2024
work page 2024
-
[7]
T. Chen and H. Chen. Approximations of continuous functionals by neural networks with application to dynamic systems.IEEE Transactions on Neural Networks, 4(6):910–918, 1993
work page 1993
-
[8]
Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems.IEEE Transactions on Neural Networks, 6(4):911–917, 1995
work page 1995
-
[9]
Applied and Numerical Harmonic Analysis
Stephan Dahlke, Filippo De Mari, Philipp Grohs, and Demetrio Labate, editors.Harmonic and Applied Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham, 2015
work page 2015
-
[10]
de Hoop, Daniel Zhengyu Huang, Elizabeth Qian, and Andrew M
Maarten V . de Hoop, Daniel Zhengyu Huang, Elizabeth Qian, and Andrew M. Stuart. The cost-accuracy trade-off in operator learning with neural networks, 2022
work page 2022
-
[11]
D. L. Donoho. Sparse components of images and optimal atomic decompositions.Constructive Approx- imation, 17(3):353–382, 2001
work page 2001
-
[12]
Takashi Furuya, Michael Anthony Puthawala, Matti Lassas, and Maarten V . de Hoop. Globally injective and bijective neural operators. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[13]
Theory-to-practice gap for neural networks and neural operators, 2025
Philipp Grohs, Samuel Lanthaler, and Margaret Trautner. Theory-to-practice gap for neural networks and neural operators, 2025
work page 2025
-
[14]
Poseidon: Efficient foundation models for PDEs
Maximilian Herde, Bogdan Raonic, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for PDEs. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[15]
Lukas Herrmann, Christoph Schwab, and Jakob Zech. Neural and spectral operator surrogates: unified construction and expression rate bounds.Advances in Computational Mathematics, 50(4):72, 2024
work page 2024
-
[16]
Daniel Zhengyu Huang, Nicholas H. Nelsen, and Margaret Trautner. An operator learning perspective on parameter-to-observable maps.Foundations of Data Science, 7(1):163–225, 2025
work page 2025
-
[17]
Mionet: Learning multiple-input operators via tensor product
Pengzhan Jin, Shuai Meng, and Lu Lu. Mionet: Learning multiple-input operators via tensor product. SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022. 25
work page 2022
-
[18]
Derek Jollie, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. Time-series forecasting and refine- ment within a multimodal pde foundation model.Journal of Machine Learning for Modeling and Com- puting, 6(2):77–89, 2025
work page 2025
-
[19]
Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020
work page 2020
-
[20]
On universal approximation and error bounds for fourier neural operators.J
Nikola Kovachki, Samuel Lanthaler, and Siddhartha Mishra. On universal approximation and error bounds for fourier neural operators.J. Mach. Learn. Res., 22(1), January 2021
work page 2021
-
[21]
Neural operator: learning maps between function spaces with applica- tions to pdes.J
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: learning maps between function spaces with applica- tions to pdes.J. Mach. Learn. Res., 24(1), January 2023
work page 2023
-
[22]
Kovachki, Samuel Lanthaler, and Hrushikesh Mhaskar
Nikola B. Kovachki, Samuel Lanthaler, and Hrushikesh Mhaskar. Data complexity estimates for operator learning, 2024
work page 2024
-
[23]
Kovachki, Samuel Lanthaler, and Andrew M
Nikola B. Kovachki, Samuel Lanthaler, and Andrew M. Stuart. Chapter 9 - operator learning: Algorithms and analysis. In Siddhartha Mishra and Alex Townsend, editors,Numerical Analysis Meets Machine Learning, volume 25 ofHandbook of Numerical Analysis, pages 419–467. Elsevier, 2024
work page 2024
-
[24]
Operator learning with pca-net: upper and lower complexity bounds.J
Samuel Lanthaler. Operator learning with pca-net: upper and lower complexity bounds.J. Mach. Learn. Res., 24(1), January 2023
work page 2023
-
[25]
Samuel Lanthaler, Siddhartha Mishra, and George E Karniadakis. Error estimates for deeponets: a deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1):tnac001, 03 2022
work page 2022
-
[26]
Samuel Lanthaler and Andrew M Stuart. The parametric complexity of operator learning.IMA Journal of Numerical Analysis, page draf028, 08 2025
work page 2025
-
[27]
Jose Antonio Lara Benitez, Takashi Furuya, Florian Faucher, Anastasis Kratsios, Xavier Tricoche, and Maarten V . de Hoop. Out-of-distributional risk bounds for neural operators with applications to the helmholtz equation.J. Comput. Phys., 513(C), September 2024
work page 2024
-
[28]
Hao Liu, Jiahui Cheng, and Wenjing Liao. Deep neural networks are adaptive to function regularity and data distribution in approximation and estimation.Journal of Machine Learning Research, 26(213):1–56, 2025
work page 2025
-
[29]
Hao Liu, Biraj Dahal, Rongjie Lai, and Wenjing Liao. Generalization error guaranteed auto-encoder- based nonlinear model reduction for operator learning.Applied and Computational Harmonic Analysis, 74:101717, 2025
work page 2025
-
[30]
Deep nonparametric estimation of operators between infinite dimensional spaces.J
Hao Liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, and Wenjing Liao. Deep nonparametric estimation of operators between infinite dimensional spaces.J. Mach. Learn. Res., 25(1), January 2024
work page 2024
-
[31]
Neural scaling laws of deep relu and deep operator network: A theoretical study, 2024
Hao Liu, Zecheng Zhang, Wenjing Liao, and Hayden Schaeffer. Neural scaling laws of deep relu and deep operator network: A theoretical study, 2024
work page 2024
-
[32]
Yuxuan Liu, Jingmin Sun, Xinjie He, Griffin Pinney, Zecheng Zhang, and Hayden Schaeffer. PROSE-FD: A multimodal PDE foundation model for learning multiple operators for forecasting fluid dynamics. In Neurips 2024 Workshop Foundation Models for Science: Progress, Opportunities, and Challenges, 2024
work page 2024
-
[33]
Yuxuan Liu, Jingmin Sun, and Hayden Schaeffer. Bcat: A block causal transformer for pde foundation models for fluid dynamics.arXiv preprint arXiv:2501.18972, 2025
-
[34]
Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting multiple operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024. 26
work page 2024
-
[35]
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021
work page 2021
-
[36]
Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George Em Karniadakis. A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022
work page 2022
-
[37]
Carlo Marcati and Christoph Schwab. Exponential convergence of deep operator networks for elliptic partial differential equations.SIAM Journal on Numerical Analysis, 61(3):1513–1545, 2023
work page 2023
-
[38]
Carlo Marcati and Christoph Schwab. Expression rates of neural operators for linear elliptic pdes in polytopes.CoRR, abs/2409.17552, 2024
-
[39]
Multiple physics pretraining for physical surrogate models.arXiv preprint arXiv:2310.02994, 2023
Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Géraud Krawezik, Francois Lanusse, et al. Multiple physics pretraining for physical surrogate models.arXiv preprint arXiv:2310.02994, 2023
-
[40]
Elisa Negrini, Yuxuan Liu, Liu Yang, Stanley J Osher, and Hayden Schaeffer. A multimodal pde founda- tion model for prediction and scientific text descriptions.arXiv preprint arXiv:2502.06026, 2025
-
[41]
Philipp Petersen and Felix V oigtlaender. Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks, 108:296–330, 2018
work page 2018
-
[42]
Christoph Schwab, Andreas Stein, and Jakob Zech. Deep operator network approximation rates for lipschitz operators.Analysis and Applications, 24(01):199–239, 2026
work page 2026
-
[43]
Jingmin Sun, Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Towards a foundation model for partial differential equations: Multioperator learning and extrapolation.Physical Review E, 111(3):035304, 2025
work page 2025
-
[44]
Lemon: Learning to learn multi-operator networks, 2025
Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. Lemon: Learning to learn multi-operator networks, 2025
work page 2025
-
[45]
Opinf-llm: Parametric pde solving with llms via operator inference, 2026
Zhuoyuan Wang, Hanjiang Hu, Xiyu Deng, Saviz Mowlavi, and Yorie Nakahira. Opinf-llm: Parametric pde solving with llms via operator inference, 2026
work page 2026
-
[46]
Adrien Weihs and Hayden Schaeffer. Generalization bounds and statistical guarantees for multi-task and multiple operator learning with mno networks, 2026
work page 2026
-
[47]
A deep learning framework for multi-operator learning: Architectures and approximation theory, 2025
Adrien Weihs, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. A deep learning framework for multi-operator learning: Architectures and approximation theory, 2025
work page 2025
-
[48]
Liu Yang, Siting Liu, Tingwei Meng, and Stanley J Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023
work page 2023
-
[49]
Liu Yang, Tingwei Meng, Siting Liu, and Stanley J Osher. Prompting in-context operator learning with sensor data, equations, and natural language.arXiv preprint arXiv:2308.05061, 2023
-
[50]
Zhanhong Ye, Zining Liu, Bingyang Wu, Hongjie Jiang, Leheng Chen, Minyan Zhang, Xiang Huang, Qinghe Meng Zou, Hongsheng Liu, and Bin Dong. Pdeformer-2: A versatile foundation model for two- dimensional partial differential equations.arXiv preprint arXiv:2507.15409, 2025
-
[51]
Benjamin J Zhang, Siting Liu, Stanley J Osher, and Markos A Katsoulakis. Probabilistic operator learn- ing: generative modeling and uncertainty quantification for foundation models of differential equations. arXiv preprint arXiv:2509.05186, 2025
-
[52]
Zecheng Zhang. Modno: Multi-operator learning with distributed neural operators.Computer Methods in Applied Mechanics and Engineering, 431:117229, 2024. 27
work page 2024
-
[53]
Zecheng Zhang, Wing Tat Leung, and Hayden Schaeffer. A discretization-invariant extension and analysis of some deep operator networks.Journal of Computational and Applied Mathematics, 456:116226, 2025
work page 2025
-
[54]
Zecheng Zhang, Hao Liu, Wenjing Liao, and Guang Lin. Coefficient-to-basis network: a fine-tunable op- erator learning framework for inverse problems with adaptive discretizations and theoretical guarantees. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 383(2305):20240054, 09 2025
work page 2025
-
[55]
Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, and Hayden Schaeffer. D2no: Efficient handling of heterogeneous input function spaces with distributed deep neural operators.Computer Methods in Applied Mechanics and Engineering, 428:117084, 2024
work page 2024
-
[56]
Zecheng Zhang, Leung Wing Tat, and Hayden Schaeffer. Belnet: basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2276):20230043, 2023. Appendix In this section, we present detailed proofs of all our results. A Near-Optimal Approximation Rates Proof of Theorem 3.1.For...
work page 2023
-
[57]
From part 1, we may taker= 1to obtainG: L rG(ΩW )×L rG(ΩU)→Vwhich is Frechet differentiable onL rG(ΩW )×L rG(ΩU). Specifically, from the proof of Corollary 3.14, we know that G[α][u](x) =F(α)ϕ(x), whereF: L rG(ΩW )→Ris the Frechet differentiable functional provided by [26, Theorem 2.11], and ϕ∈Vis a fixed nontrivial function. Next, the proof of [26, Lemma...
-
[58]
The upper bound is a direct consequence of Theorem 3.1
The lower bound in (16) is given by combining parts 1 and 2 of the theorem. The upper bound is a direct consequence of Theorem 3.1. C An Extension of DeepONet to Multi-Task Learning Proof of Lemma 3.21.Let(α, u)∈W×Uandx∈Ω V . Then evx ◦NN[α][u] = HX k=1 NX ℓ=1 θkℓ bk(MW (α), MU(u))τ ℓ(x) = HX k=1 NX ℓ=1 θkℓτℓ(x) ! bk(MW (α), MU(u)) =: ˜τ(x)⊤b(MW (α), MU(u...
-
[59]
From part 1, we may taker= 1to obtainG: L rG(ΩW )×L rG(ΩU)→Vwhich is Frechet differentiable onL rG(ΩW )×L rG(ΩU). Specifically, from the proof of [26, Corollary 2.12], we know that G[α][u](x) =F(α, u)ϕ(x), whereF: L rG(ΩW )×L rG(ΩU)→Ris the Frechet differentiable functional provided by [26, Theorem 2.11], andϕ∈Vis a fixed nontrivial function. Next, the pr...
-
[60]
The upper bound is a direct consequence of Proposition 3.25 and Remark 3.26
The lower bound in (16) is given by combining parts 1 and 2 of the theorem. The upper bound is a direct consequence of Proposition 3.25 and Remark 3.26. 46
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.