arxiv: 2605.04995 · v1 · submitted 2026-05-06 · 💻 cs.LG · math.ST· stat.ML· stat.TH

Recognition: unknown

Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

Anastasis Kratsios , A. Martina Neuman , Philipp Petersen

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:56 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.MLstat.TH

keywords in-context learningagentic learningadaptivityReLU realizabilityuniform approximationtask familiesrepresentational constraints

0 comments

The pith

Adaptivity advantages in task approximation can appear, persist, vanish, or never exist when restricted to ReLU neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares fixed-query in-context learning against adaptive-query agentic learning when approximating families of tasks uniformly. It examines an unrestricted regime where any functions are allowed and a realizable regime where querying and approximation must be realized exactly by ReLU neural networks. Adaptivity never reduces performance in either regime, yet four concrete task families show that its benefits behave differently across the regimes: absent in both, present and unchanged, present only under ReLU constraints, or present only in the unrestricted case. This matters because it demonstrates that the practical value of adaptive querying depends on the representational limits imposed by the model class.

Core claim

We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that is

What carries the argument

Four explicit task families that witness distinct behaviors of adaptivity advantage when passing from arbitrary functions to ReLU-realizable operations.

Load-bearing premise

The realizable regime assumes that querying and approximation operations can be exactly implemented by ReLU neural networks for the chosen task families.

What would settle it

Explicit computation of uniform approximation rates for one of the task families under both regimes, verifying whether the observed adaptive advantage exactly matches one of the four predicted scenarios rather than a fifth behavior.

Figures

Figures reproduced from arXiv: 2605.04995 by A. Martina Neuman, Anastasis Kratsios, Philipp Petersen.

**Figure 1.** Figure 1: Cubical-path task family T path d,L . A representative task from (6) is shown in the left figure. Each task f Γ ∈ T path d,L encodes, at every successful query, the location of the next informative query, allowing an adaptive strategy to recover the path sequentially. In one dimension, identifying the next relevant sub-cube amounts a binary search, shown in the right figure. 7 view at source ↗

**Figure 2.** Figure 2: Pointed-value family T val N,m. A representative task from (9) with N = 6 is shown. The fixed hats at q1, . . . , q5 carry the coefficients q ∗ , s2, . . . , s5 respectively, and the moving hat centered at q ∗ ∈ [2/3, 1] carries the value gm(s). An unrestricted learner, whether in-context or agentic, can infer the position of the hat at q ∗ by querying the task at the sample point q1. A general in-context … view at source ↗

**Figure 3.** Figure 3: Address-spike family T addr N,m . A representative task from (11) for N = 6 is shown: the coefficients at q1, . . . , q5 are respectively s1, . . . , s5, and these values determine the moving location q ∗ (s) ∈ [2/3, 1]. The moving hat centered at q ∗ (s) carries a hidden bit β. We show that no in-context learner can reliably identify the value of β, and the same is true for a ReLU-based agentic learner. A… view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper carves out four scenarios for how adaptivity's edge changes when both querying and approximation are forced to be ReLU networks, but the explicit task families and their ReLU realizations are asserted without the architectures or bounds shown.

read the letter

The new piece here is the four-way split: adaptivity can be irrelevant, stay helpful, become helpful only under ReLU limits, or lose its edge once you impose realizability. The abstract ties each case to a concrete task family, which is a step beyond the usual blanket statements that adaptivity helps or doesn't in approximation settings. That classification is the cleanest part and could matter for anyone modeling agentic loops versus fixed-context predictors when the model class is a neural net. It also correctly notes that adaptivity never hurts in either regime, which keeps the comparison grounded. The rest of the work is mostly setting up the unrestricted versus realizable distinction and stating that the separations exist. The soft spot is exactly where the stress-test flags it. The central claims rest on those task families being realizable by finite ReLU nets for both the adaptive policy and the final approximator, yet the provided text gives no widths, depths, or explicit maps that would let a reader check the uniform approximation bounds or confirm the policy stays inside the ReLU class. Without those constructions the four scenarios remain statements rather than verified examples, so the interaction between constraints and adaptivity is not yet demonstrated. This is aimed at people who already work in approximation theory for learning algorithms and want to see how representational limits interact with query adaptivity. It is worth sending to referees because the classification is precise enough that a careful review could either confirm the constructions or force the authors to supply them; desk rejection would be premature.

Referee Report

1 major / 0 minor

Summary. The paper compares in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. It considers an unrestricted regime (arbitrary functions) and a ReLU-realizable regime (operations implemented by ReLU networks). The central claim is that adaptivity never hurts performance, but its advantage changes across four scenarios witnessed by explicit task families: (a) no advantage of adaptivity, (b) advantage persists under ReLU realizability, (c) advantage arises only under realizability, and (d) advantage disappears under realizability.

Significance. If the explicit task-family constructions, including verification of exact ReLU realizability for both querying and approximation maps, hold up, the result would clarify how representational constraints interact with adaptivity benefits. This could inform when theoretical advantages of agentic querying translate to neural-network implementations versus unrestricted settings.

major comments (1)

[Abstract] Abstract: The four scenarios are asserted to be witnessed by explicit task families, with ReLU realizability for adaptive querying and approximation operations. However, no network architectures (width, depth, weights), depth bounds, or verification that the adaptive policy remains exactly ReLU while delivering the claimed separations are supplied. This is load-bearing for scenarios (b)-(d), as the changes in advantage under realizability could be artifacts of an implicit non-ReLU adaptive mechanism.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on the interaction between adaptivity and realizability. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The four scenarios are asserted to be witnessed by explicit task families, with ReLU realizability for adaptive querying and approximation operations. However, no network architectures (width, depth, weights), depth bounds, or verification that the adaptive policy remains exactly ReLU while delivering the claimed separations are supplied. This is load-bearing for scenarios (b)-(d), as the changes in advantage under realizability could be artifacts of an implicit non-ReLU adaptive mechanism.

Authors: We appreciate the referee highlighting the importance of concrete realizability details. The manuscript provides explicit constructions for all four task families in Sections 4--7, including proofs that both the adaptive querying policies and approximation maps are exactly realizable by ReLU networks. For each scenario we specify bounded depth (at most 5 layers) and width (linear in the input dimension), along with the functional form of the weights that implement the required adaptive choice and approximation while preserving the claimed separation. The verification proceeds by expressing the policy as a finite composition of linear layers and ReLU activations, with explicit threshold computations that select the next query. These constructions are constructive and therefore cannot be artifacts of a non-ReLU mechanism. To improve accessibility we will revise the abstract to reference the bounded-depth ReLU realizations and add a summary table of architectures in the main text. revision: yes

Circularity Check

0 steps flagged

No circularity: independent mathematical constructions for task families

full rationale

The paper defines four explicit task families to separate the effects of adaptivity across unrestricted and ReLU-realizable regimes. These constructions are presented as direct witnesses to the claimed scenarios without any equations that reduce outputs to inputs by definition, without fitted parameters renamed as predictions, and without load-bearing self-citations or imported uniqueness theorems. The realizability claims are asserted via existence of ReLU implementations for the chosen families, but this is a completeness issue rather than a circular reduction; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, ad-hoc axioms, or invented entities are mentioned; relies on standard concepts from approximation theory and neural network expressivity not detailed here.

pith-pipeline@v0.9.0 · 5453 in / 1125 out tokens · 51829 ms · 2026-05-08T16:56:01.947672+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 10 canonical work pages

[1]

Adcock, B.Optimal sampling for least-squares approximation.Foundations of Computational Mathematics(2025), 1–60

2025
[2]

InThe Eleventh International Conference on Learning Representations(2023)

Akyürek, E., Schuurmans, D., Andreas, J., Ma, T., and Zhou, D.What learning algorithm is in-context learning? investigations with linear models. InThe Eleventh International Conference on Learning Representations(2023). [3]Angluin, D.Queries and concept learning.Machine Learning 2, 4 (1987), 319–342

2023
[3]

L.Neural Network Learning: Theoretical Foundations

Anthony, M., and Bartlett, P. L.Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999

1999
[4]

L.Neural Network Learning: Theoretical Foundations

Anthony, M., and Bartlett, P. L.Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 2009

2009
[5]

Argyriou, A., Evgeniou, T., and Pontil, M.Convex multi-task feature learning.Machine Learning 73, 3 (2008), 243–272

2008
[6]

R.Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory 39, 3 (1993), 930–945

Barron, A. R.Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory 39, 3 (1993), 930–945

1993
[7]

I., and Pehlev an, C.Theory of scaling laws for in-context regression: Depth, width, context and time.arXiv preprint arXiv:2510.01098(2025)

Bordelon, B., Letey, M. I., and Pehlev an, C.Theory of scaling laws for in-context regression: Depth, width, context and time.arXiv preprint arXiv:2510.01098(2025)

work page arXiv 2025
[8]

B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariw al, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agar w al, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Rad...

2020
[9]

Castro, R., and Now ak, R.Active sensing and learning.Foundations and Applications of Sensor Management(2009), 177–200

2009
[10]

In European Conference on Computer Vision(2022), Springer, pp

Chen, Y., and W ang, X.Transformers as meta-learners for implicit neural representations. In European Conference on Computer Vision(2022), Springer, pp. 170–187

2022
[11]

Cohen, A., DeVore, R., Petrov a, G., and Wojtaszczyk, P.Optimal stable nonlinear approxima- tion.Foundations of Computational Mathematics 22, 3 (2022), 607–648

2022
[12]

A., Atlas, L., and Ladner, R

Cohn, D. A., Atlas, L., and Ladner, R. E.Improving generalization with active learning.Machine Learning 15, 2 (1994), 201–221. [15]Dasgupta, S.Two faces of active learning.Theoretical Computer Science 412, 19 (2011), 1767–1781

1994
[13]

C.Equivalence of approximation by networks of single-and multi-spike neurons.arXiv preprint arXiv:2603.13478(2026)

Dold, D., and Petersen, P. C.Equivalence of approximation by networks of single-and multi-spike neurons.arXiv preprint arXiv:2603.13478(2026)

work page arXiv 2026
[14]

V., and Peyré, G.Transformers are universal in-context learners

Furuya, T., de Hoop, M. V., and Peyré, G.Transformers are universal in-context learners. InThe Thirteenth International Conference on Learning Representations(2025)

2025
[15]

Furuya, T., Kratsios, A., Possamaï, D., and Raonić, B.One model to solve them all: 2bsde families via neural operators.arXiv preprint arXiv:2511.01125(2025). 11

work page arXiv 2025
[16]

S., and V aliant, G.What can transformers learn in-context? a case study of simple function classes.Advances in Neural Information Processing Systems 35(2022), 30583–30598

Garg, S., Tsipras, D., Liang, P. S., and V aliant, G.What can transformers learn in-context? a case study of simple function classes.Advances in Neural Information Processing Systems 35(2022), 30583–30598

2022
[17]

[21]Hu, J

Gühring, I., and Raslan, M.Approximation rates for neural networks with encodable weights in smoothness spaces.Neural Networks 134(2021), 107–130. [21]Hu, J. Y.-C., Lu, M., Lee, Y.-C., and Liu, H.Transformer approximations from relus, 2026

2021
[18]

N., and Tikhomirov, V

Kolmogorov, A. N., and Tikhomirov, V. M. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Matematicheskikh Nauk 14, 2 (1959), 3–86

1959
[19]

Kov achki, N., Lanthaler, S., and Mishra, S.On universal approximation and error bounds for Fourier neural operators.Journal of Machine Learning Research 22, 290 (2021), 1–76

2021
[20]

Journal of Machine Learning Research 24, 89 (2023), 1–97

Kov achki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A.Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research 24, 89 (2023), 1–97

2023
[21]

S., and Roy, D.Beyond universal approximation theorems: Algorithmic uniform approximation by neural networks trained with noisy data.arXiv preprint arXiv:2509.00924 (2025)

Kratsios, A., Cheng, T. S., and Roy, D.Beyond universal approximation theorems: Algorithmic uniform approximation by neural networks trained with noisy data.arXiv preprint arXiv:2509.00924 (2025)

work page arXiv 2025
[22]

arXiv preprint arXiv:2502.03327(2025)

Kratsios, A., and Furuya, T.Is in-context universality enough? mlps are also universal in-context. arXiv preprint arXiv:2502.03327(2025)

work page arXiv 2025
[23]

Kratsios, A., Neufeld, A., and Schmocker, P.Generative neural operators of log-complexity can simultaneously solve infinitely many convex programs.arXiv preprint arXiv:2508.14995(2025)

work page arXiv 2025
[24]

Kratsios, A., and Papon, L.Universal approximation theorems for differentiable geometric deep learning.Journal of Machine Learning Research 23, 196 (2022), 1–73

2022
[25]

Krieg, D., and Ullrich, M.Function values are enough for l 2-approximation.Foundations of Computational Mathematics 21, 4 (2021), 1141–1151

2021
[26]

E.Error estimates for DeepONets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications 6, 1 (2022), tnac001

Lanthaler, S., Mishra, S., and Karniadakis, G. E.Error estimates for DeepONets: A deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications 6, 1 (2022), tnac001

2022
[27]

M.The parametric complexity of operator learning.IMA Journal of Numerical Analysis 46, 2 (2026), 647–712

Lanthaler, S., and Stuart, A. M.The parametric complexity of operator learning.IMA Journal of Numerical Analysis 46, 2 (2026), 647–712

2026
[28]

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D.Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems 33(2020)

2020
[29]

Li, G., Jiao, Y., Huang, Y., Wei, Y., and Chen, Y.Transformers meet in-context learning: A universal approximation theory.arXiv preprint arXiv:2506.05200(2025)

work page arXiv 2025
[30]

InInternational Conference on Learning Representations(2021)

Li, Z., Kov achki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A.Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations(2021)

2021
[31]

Maiorov, V., and Pinkus, A.Lower bounds for approximation by MLP neural networks.Neurocom- puting 25, 1–3 (1999), 81–91

1999
[32]

Mhaskar, H., Liao, Q., and Poggio, T.When and why are deep networks better than shallow ones? InProceedings of the AAAI conference on artificial intelligence(2017), vol. 31. 12

2017
[33]

InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics(2022), pp

Mishra, S., Khashabi, D., Baral, C., and Hajishirzi, H.Cross-task generalization via natural language crowdsourcing instructions. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics(2022), pp. 3470–3487

2022
[34]

M., Dold, D., and Petersen, P

Neuman, A. M., Dold, D., and Petersen, P. C.Stable learning using spiking neural networks equipped with affine encoders and decoders.Journal of Machine Learning Research 26, 246 (2025), 1–49

2025
[35]

Volume I: Linear Informa- tion

Nov ak, E., and Woźniakowski, H.Tractability of Multivariate Problems. Volume I: Linear Informa- tion. European Mathematical Society, Zürich, 2008

2008
[36]

W., and Woźniakowski, H.Recent developments in information-based complexity

Packel, E. W., and Woźniakowski, H.Recent developments in information-based complexity. Bulletin of the American Mathematical Society 17, 1 (1987), 9–36

1987
[37]

S., O’Brien, J

Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(2023)

2023
[38]

Petersen, P., and Voigtlaender, F.Optimal approximation of piecewise smooth functions using deep relu neural networks.Neural Networks 108(2018), 296–330

2018
[39]

Petersen, P., and Voigtlaender, F.Equivalence of approximation by convolutional neural networks and fully-connected networks.Proceedings of the American Mathematical Society 148, 4 (2020), 1567– 1581

2020
[40]

Petersen, P., and Zech, J.Mathematical theory of deep learning.arXiv preprint arXiv:2407.18384 (2024)

work page arXiv 2024
[41]

7 ofErgebnisse der Mathematik und ihrer Grenzgebiete

Pinkus, A.n-Widths in Approximation Theory, vol. 7 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics. Springer Berlin, Heidelberg, 1985

1985
[42]

Pinkus, A.Approximation theory of the MLP model in neural networks.Acta Numerica 8(1999), 143–195

1999
[43]

Sung, K., and Niyogi, P.Active learning for function approximation.Advances in neural information processing systems 7(1994)

1994
[44]

F., W asilkowski, G

Traub, J. F., W asilkowski, G. W., and Woźniakowski, H.Information-Based Complexity. Academic Press, New York, 1988

1988
[45]

InInternational Conference on Learning Representations (ICLR)(2023)

von Osw ald, J., Niklasson, E., Schäfer, L., Zhao, Z., Ma, T., Schölkopf, B., and Domke, J.What learning algorithm is in-context learning? investigations with linear models. InInternational Conference on Learning Representations (ICLR)(2023)

2023
[46]

InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(2022), pp

W ang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., et al.Super- naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing(2022), pp. 5085–5109

2022
[47]

L.How many pretraining tasks are needed for in-context learning of linear regression?arXiv preprint arXiv:2310.08391(2023)

Wu, J., Zou, D., Chen, Z., Bra verman, V., Gu, Q., and Bartlett, P. L.How many pretraining tasks are needed for in-context learning of linear regression?arXiv preprint arXiv:2310.08391(2023)

work page arXiv 2023
[48]

Yang, Y., Tanaka, H., and Hu, W.Provable low-frequency bias of in-context learning of representa- tions.arXiv preprint arXiv:2507.13540(2025)

work page arXiv 2025
[49]

R., and Cao, Y.ReAct: Synergizing reasoning and acting in language models

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y.ReAct: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations(2023). 13

2023
[50]

Yarotsky, D.Error bounds for approximations with deep ReLU networks.Neural Networks 94(2017), 103–114

2017
[51]

L.Trained transformers learn linear models in-context

Zhang, R., Frei, S., and Bartlett, P. L.Trained transformers learn linear models in-context. Journal of Machine Learning Research 25, 49 (2024), 1–55. A Deep learning models For completeness, we briefly recall the relevant deep learning models. Definition A.1(Fully connected MLPs).Let d, D, L∈N , and letd0, d1, . . . , dL ∈N such that d0 = d and dL = D. L...

2024