Recognition: unknown
Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning
Pith reviewed 2026-05-08 16:56 UTC · model grok-4.3
The pith
Adaptivity advantages in task approximation can appear, persist, vanish, or never exist when restricted to ReLU neural networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that is
What carries the argument
Four explicit task families that witness distinct behaviors of adaptivity advantage when passing from arbitrary functions to ReLU-realizable operations.
Load-bearing premise
The realizable regime assumes that querying and approximation operations can be exactly implemented by ReLU neural networks for the chosen task families.
What would settle it
Explicit computation of uniform approximation rates for one of the task families under both regimes, verifying whether the observed adaptive advantage exactly matches one of the four predicted scenarios rather than a fifth behavior.
Figures
read the original abstract
We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that persists under ReLU realizability; (c) an advantage that arises only under realizability; and (d) an advantage that disappears under realizability. This demonstrates that representational constraints interact profoundly with the effect of adaptivity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. It considers an unrestricted regime (arbitrary functions) and a ReLU-realizable regime (operations implemented by ReLU networks). The central claim is that adaptivity never hurts performance, but its advantage changes across four scenarios witnessed by explicit task families: (a) no advantage of adaptivity, (b) advantage persists under ReLU realizability, (c) advantage arises only under realizability, and (d) advantage disappears under realizability.
Significance. If the explicit task-family constructions, including verification of exact ReLU realizability for both querying and approximation maps, hold up, the result would clarify how representational constraints interact with adaptivity benefits. This could inform when theoretical advantages of agentic querying translate to neural-network implementations versus unrestricted settings.
major comments (1)
- [Abstract] Abstract: The four scenarios are asserted to be witnessed by explicit task families, with ReLU realizability for adaptive querying and approximation operations. However, no network architectures (width, depth, weights), depth bounds, or verification that the adaptive policy remains exactly ReLU while delivering the claimed separations are supplied. This is load-bearing for scenarios (b)-(d), as the changes in advantage under realizability could be artifacts of an implicit non-ReLU adaptive mechanism.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on the interaction between adaptivity and realizability. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The four scenarios are asserted to be witnessed by explicit task families, with ReLU realizability for adaptive querying and approximation operations. However, no network architectures (width, depth, weights), depth bounds, or verification that the adaptive policy remains exactly ReLU while delivering the claimed separations are supplied. This is load-bearing for scenarios (b)-(d), as the changes in advantage under realizability could be artifacts of an implicit non-ReLU adaptive mechanism.
Authors: We appreciate the referee highlighting the importance of concrete realizability details. The manuscript provides explicit constructions for all four task families in Sections 4--7, including proofs that both the adaptive querying policies and approximation maps are exactly realizable by ReLU networks. For each scenario we specify bounded depth (at most 5 layers) and width (linear in the input dimension), along with the functional form of the weights that implement the required adaptive choice and approximation while preserving the claimed separation. The verification proceeds by expressing the policy as a finite composition of linear layers and ReLU activations, with explicit threshold computations that select the next query. These constructions are constructive and therefore cannot be artifacts of a non-ReLU mechanism. To improve accessibility we will revise the abstract to reference the bounded-depth ReLU realizations and add a summary table of architectures in the main text. revision: yes
Circularity Check
No circularity: independent mathematical constructions for task families
full rationale
The paper defines four explicit task families to separate the effects of adaptivity across unrestricted and ReLU-realizable regimes. These constructions are presented as direct witnesses to the claimed scenarios without any equations that reduce outputs to inputs by definition, without fitted parameters renamed as predictions, and without load-bearing self-citations or imported uniqueness theorems. The realizability claims are asserted via existence of ReLU implementations for the chosen families, but this is a completeness issue rather than a circular reduction; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Adcock, B.Optimal sampling for least-squares approximation.Foundations of Computational Mathematics(2025), 1–60
2025
-
[2]
InThe Eleventh International Conference on Learning Representations(2023)
Akyürek, E., Schuurmans, D., Andreas, J., Ma, T., and Zhou, D.What learning algorithm is in-context learning? investigations with linear models. InThe Eleventh International Conference on Learning Representations(2023). [3]Angluin, D.Queries and concept learning.Machine Learning 2, 4 (1987), 319–342
2023
-
[3]
L.Neural Network Learning: Theoretical Foundations
Anthony, M., and Bartlett, P. L.Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999
1999
-
[4]
L.Neural Network Learning: Theoretical Foundations
Anthony, M., and Bartlett, P. L.Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 2009
2009
-
[5]
Argyriou, A., Evgeniou, T., and Pontil, M.Convex multi-task feature learning.Machine Learning 73, 3 (2008), 243–272
2008
-
[6]
R.Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory 39, 3 (1993), 930–945
Barron, A. R.Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory 39, 3 (1993), 930–945
1993
-
[7]
Bordelon, B., Letey, M. I., and Pehlev an, C.Theory of scaling laws for in-context regression: Depth, width, context and time.arXiv preprint arXiv:2510.01098(2025)
-
[8]
B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariw al, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agar w al, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Rad...
2020
-
[9]
Castro, R., and Now ak, R.Active sensing and learning.Foundations and Applications of Sensor Management(2009), 177–200
2009
-
[10]
In European Conference on Computer Vision(2022), Springer, pp
Chen, Y., and W ang, X.Transformers as meta-learners for implicit neural representations. In European Conference on Computer Vision(2022), Springer, pp. 170–187
2022
-
[11]
Cohen, A., DeVore, R., Petrov a, G., and Wojtaszczyk, P.Optimal stable nonlinear approxima- tion.Foundations of Computational Mathematics 22, 3 (2022), 607–648
2022
-
[12]
A., Atlas, L., and Ladner, R
Cohn, D. A., Atlas, L., and Ladner, R. E.Improving generalization with active learning.Machine Learning 15, 2 (1994), 201–221. [15]Dasgupta, S.Two faces of active learning.Theoretical Computer Science 412, 19 (2011), 1767–1781
1994
-
[13]
Dold, D., and Petersen, P. C.Equivalence of approximation by networks of single-and multi-spike neurons.arXiv preprint arXiv:2603.13478(2026)
-
[14]
V., and Peyré, G.Transformers are universal in-context learners
Furuya, T., de Hoop, M. V., and Peyré, G.Transformers are universal in-context learners. InThe Thirteenth International Conference on Learning Representations(2025)
2025
- [15]
-
[16]
S., and V aliant, G.What can transformers learn in-context? a case study of simple function classes.Advances in Neural Information Processing Systems 35(2022), 30583–30598
Garg, S., Tsipras, D., Liang, P. S., and V aliant, G.What can transformers learn in-context? a case study of simple function classes.Advances in Neural Information Processing Systems 35(2022), 30583–30598
2022
-
[17]
[21]Hu, J
Gühring, I., and Raslan, M.Approximation rates for neural networks with encodable weights in smoothness spaces.Neural Networks 134(2021), 107–130. [21]Hu, J. Y.-C., Lu, M., Lee, Y.-C., and Liu, H.Transformer approximations from relus, 2026
2021
-
[18]
N., and Tikhomirov, V
Kolmogorov, A. N., and Tikhomirov, V. M. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Matematicheskikh Nauk 14, 2 (1959), 3–86
1959
-
[19]
Kov achki, N., Lanthaler, S., and Mishra, S.On universal approximation and error bounds for Fourier neural operators.Journal of Machine Learning Research 22, 290 (2021), 1–76
2021
-
[20]
Journal of Machine Learning Research 24, 89 (2023), 1–97
Kov achki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A.Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research 24, 89 (2023), 1–97
2023
-
[21]
Kratsios, A., Cheng, T. S., and Roy, D.Beyond universal approximation theorems: Algorithmic uniform approximation by neural networks trained with noisy data.arXiv preprint arXiv:2509.00924 (2025)
-
[22]
arXiv preprint arXiv:2502.03327(2025)
Kratsios, A., and Furuya, T.Is in-context universality enough? mlps are also universal in-context. arXiv preprint arXiv:2502.03327(2025)
- [23]
-
[24]
Kratsios, A., and Papon, L.Universal approximation theorems for differentiable geometric deep learning.Journal of Machine Learning Research 23, 196 (2022), 1–73
2022
-
[25]
Krieg, D., and Ullrich, M.Function values are enough for l 2-approximation.Foundations of Computational Mathematics 21, 4 (2021), 1141–1151
2021
-
[26]
E.Error estimates for DeepONets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications 6, 1 (2022), tnac001
Lanthaler, S., Mishra, S., and Karniadakis, G. E.Error estimates for DeepONets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications 6, 1 (2022), tnac001
2022
-
[27]
M.The parametric complexity of operator learning.IMA Journal of Numerical Analysis 46, 2 (2026), 647–712
Lanthaler, S., and Stuart, A. M.The parametric complexity of operator learning.IMA Journal of Numerical Analysis 46, 2 (2026), 647–712
2026
-
[28]
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D.Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems 33(2020)
2020
- [29]
-
[30]
InInternational Conference on Learning Representations(2021)
Li, Z., Kov achki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A.Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations(2021)
2021
-
[31]
Maiorov, V., and Pinkus, A.Lower bounds for approximation by MLP neural networks.Neurocom- puting 25, 1–3 (1999), 81–91
1999
-
[32]
Mhaskar, H., Liao, Q., and Poggio, T.When and why are deep networks better than shallow ones? InProceedings of the AAAI conference on artificial intelligence(2017), vol. 31. 12
2017
-
[33]
InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics(2022), pp
Mishra, S., Khashabi, D., Baral, C., and Hajishirzi, H.Cross-task generalization via natural language crowdsourcing instructions. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics(2022), pp. 3470–3487
2022
-
[34]
M., Dold, D., and Petersen, P
Neuman, A. M., Dold, D., and Petersen, P. C.Stable learning using spiking neural networks equipped with affine encoders and decoders.Journal of Machine Learning Research 26, 246 (2025), 1–49
2025
-
[35]
Volume I: Linear Informa- tion
Nov ak, E., and Woźniakowski, H.Tractability of Multivariate Problems. Volume I: Linear Informa- tion. European Mathematical Society, Zürich, 2008
2008
-
[36]
W., and Woźniakowski, H.Recent developments in information-based complexity
Packel, E. W., and Woźniakowski, H.Recent developments in information-based complexity. Bulletin of the American Mathematical Society 17, 1 (1987), 9–36
1987
-
[37]
S., O’Brien, J
Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(2023)
2023
-
[38]
Petersen, P., and Voigtlaender, F.Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks 108(2018), 296–330
2018
-
[39]
Petersen, P., and Voigtlaender, F.Equivalence of approximation by convolutional neural networks and fully-connected networks.Proceedings of the American Mathematical Society 148, 4 (2020), 1567– 1581
2020
- [40]
-
[41]
7 ofErgebnisse der Mathematik und ihrer Grenzgebiete
Pinkus, A.n-Widths in Approximation Theory, vol. 7 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics. Springer Berlin, Heidelberg, 1985
1985
-
[42]
Pinkus, A.Approximation theory of the MLP model in neural networks.Acta Numerica 8(1999), 143–195
1999
-
[43]
Sung, K., and Niyogi, P.Active learning for function approximation.Advances in neural information processing systems 7(1994)
1994
-
[44]
F., W asilkowski, G
Traub, J. F., W asilkowski, G. W., and Woźniakowski, H.Information-Based Complexity. Academic Press, New York, 1988
1988
-
[45]
InInternational Conference on Learning Representations (ICLR)(2023)
von Osw ald, J., Niklasson, E., Schäfer, L., Zhao, Z., Ma, T., Schölkopf, B., and Domke, J.What learning algorithm is in-context learning? investigations with linear models. InInternational Conference on Learning Representations (ICLR)(2023)
2023
-
[46]
InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(2022), pp
W ang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., et al.Super- naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(2022), pp. 5085–5109
2022
-
[47]
Wu, J., Zou, D., Chen, Z., Bra verman, V., Gu, Q., and Bartlett, P. L.How many pretraining tasks are needed for in-context learning of linear regression?arXiv preprint arXiv:2310.08391(2023)
- [48]
-
[49]
R., and Cao, Y.ReAct: Synergizing reasoning and acting in language models
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y.ReAct: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations(2023). 13
2023
-
[50]
Yarotsky, D.Error bounds for approximations with deep ReLU networks.Neural Networks 94(2017), 103–114
2017
-
[51]
L.Trained transformers learn linear models in-context
Zhang, R., Frei, S., and Bartlett, P. L.Trained transformers learn linear models in-context. Journal of Machine Learning Research 25, 49 (2024), 1–55. A Deep learning models For completeness, we briefly recall the relevant deep learning models. Definition A.1(Fully connected MLPs).Let d, D, L∈N , and letd0, d1, . . . , dL ∈N such that d0 = d and dL = D. L...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.