arxiv: 2605.08007 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Interpreting Reinforcement Learning Agents with Susceptibilities

Chris Elliott, Daniel Murfet, David Quarel, Einar Urdshals

Pith reviewed 2026-05-11 02:43 UTC · model grok-4.3

classification 💻 cs.LG

keywords susceptibilitiesreinforcement learninginterpretabilityregretgridworldparameter spaceactivation steeringRLHF

0 comments

The pith

Susceptibilities applied to regret detect internal stages of RL agent development in parameter space that cannot be seen from the learned policy alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper generalizes susceptibilities, which track how perturbing the loss changes posterior expectation values of observables, to the regret setting in deep reinforcement learning. In a gridworld environment that learns in distinct stages, this technique identifies shifts in the agent's internal model parameters during training that remain invisible when one only tracks how the policy's behavior evolves. The work validates the signals by steering activations to match the detected features and sketches an extension to RLHF post-training. A sympathetic reader would care because surface-level policy inspection often misses why an agent develops one way rather than another, limiting diagnosis of training dynamics.

Core claim

Susceptibilities, defined as the response of posterior expectation values of observables to perturbations of the loss, when generalized to the regret incurred by a reinforcement learning agent, reveal features of the model's development in parameter space that cannot be detected by studying the development of the learned policy alone, as shown in a gridworld model with non-trivial stagewise progress.

What carries the argument

Susceptibilities, which quantify the sensitivity of posterior expectations of observables to small changes in the loss (here generalized to regret).

If this is right

In the gridworld, susceptibilities pick up stagewise internal changes during training.
These changes are invisible when tracking only the policy's performance over time.
Activation steering can be used to confirm that the susceptibility signals correspond to real internal features.
The same construction is proposed as a route to interpretability in RLHF post-training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If susceptibilities work in this gridworld, they could be applied to compare two agents that reach similar final policies but took different internal routes.
The method might help diagnose when an agent's learning trajectory diverges from expectations even if its final behavior looks normal.
Testing the same observables on larger environments would show whether the hidden parameter-space stages persist beyond toy settings.

Load-bearing premise

The simple gridworld model with non-trivial stagewise development is representative enough that the susceptibilities technique will generalize usefully to regret in deep RL agents and to RLHF post-training.

What would settle it

In a deeper RL agent or actual RLHF run, compute the susceptibilities and check whether they still identify parameter-space features that activation steering cannot confirm or that are already visible from the policy's learning curve alone.

Figures

Figures reproduced from arXiv: 2605.08007 by Chris Elliott, Daniel Murfet, David Quarel, Einar Urdshals.

**Figure 1.** Figure 1: Training dynamics for a model trained with α = 0.6. Top row: Cheese-incorner environment; an RL agent (mouse) navigates to cheese (+1 reward). Initial states are colored by mouse position relative to the cheese (two rightmost panels); in a fraction 1 − α of environments cheese is in the top-left corner (top right panel), so all initial mouse positions are red/orange/yellow. Middle and lower row: Middle le… view at source ↗

**Figure 2.** Figure 2: Weight-restricted LLCs for the individual layers of the model. We see that in phase 1 (blue background), while the policy is ”blind”, the LLC is dominated by the two last layers. Then, as the model enters phase 2 (beige background) and learns to ”see”, the Conv layers activate and start to dominate the LLC. We note the LLC of all layers have a peak as the model enters phase 3 (magenta background). relative… view at source ↗

**Figure 3.** Figure 3: “Streaks” in phase 1 susceptibilities. In the left panels we see the states corresponding to the streaks, where the cheese is placed to the right of the top left corner and the mouse is located along the left column (top) where the cheese is placed below the top left corner and the mouse is located along the top row (bottom). The arrows are pointing to the linear pattern in the susceptibility plot in the … view at source ↗

**Figure 4.** Figure 4: Susceptibilities and LLC estimator compared to regret for a single training run [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of the behavior of the four metrics and the unnormalized cluster [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Similarity between susceptibilities of pairs of runs across two initialization seeds. [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

**Figure 7.** Figure 7: Susceptibilities and LLC and regret curves for models trained with [PITH_FULL_IMAGE:figures/full_fig_p047_7.png] view at source ↗

**Figure 7.** Figure 7: (continued) 48 [PITH_FULL_IMAGE:figures/full_fig_p048_7.png] view at source ↗

**Figure 7.** Figure 7: (continued) 49 [PITH_FULL_IMAGE:figures/full_fig_p049_7.png] view at source ↗

**Figure 7.** Figure 7: (continued) 50 [PITH_FULL_IMAGE:figures/full_fig_p050_7.png] view at source ↗

**Figure 7.** Figure 7: (continued) 51 [PITH_FULL_IMAGE:figures/full_fig_p051_7.png] view at source ↗

**Figure 8.** Figure 8: Susceptibilities and LLC and regret curves for models trained with [PITH_FULL_IMAGE:figures/full_fig_p052_8.png] view at source ↗

**Figure 8.** Figure 8: (continued) 53 [PITH_FULL_IMAGE:figures/full_fig_p053_8.png] view at source ↗

**Figure 8.** Figure 8: (continued) 54 [PITH_FULL_IMAGE:figures/full_fig_p054_8.png] view at source ↗

**Figure 8.** Figure 8: (continued) 55 [PITH_FULL_IMAGE:figures/full_fig_p055_8.png] view at source ↗

read the original abstract

Susceptibilities are a technique for neural network interpretability that studies the response of posterior expectation values of observables to perturbations of the loss. We generalize this construction to the setting of the regret in deep reinforcement learning and investigate the utility of susceptibilities in a simple gridworld model that nevertheless exhibits non-trivial stagewise development. We argue that susceptibilities reveal internal features of the development of the model in parameter space that one cannot detect purely by studying the development of the learned policy. We validate these results with activation-steering, and discuss the framework's extension to RLHF post-training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts susceptibilities to RL regret and shows in a gridworld that they can flag parameter-space stages missed by policy inspection alone, but the toy setting limits what we can conclude for real agents.

read the letter

The core move is taking susceptibilities from neural network interpretability and reworking them around the regret objective instead of a standard loss. In the gridworld they chose, which has explicit stagewise development, this lets them track how small perturbations to regret affect internal expectations in ways that do not show up when you just plot policy improvement over training. The activation-steering checks provide an external way to confirm that the susceptibilities are tracking something about the agent's internals rather than just re-describing the policy trajectory. That combination is the actual new piece, and it is executed cleanly enough to make the point in this controlled case. The authors are careful to define the construction in terms of perturbations and to validate it separately, which avoids the most obvious circularity risks. The gridworld is simple but not trivial, so the demonstration has some bite. The main soft spot is exactly the one the stress-test note flags: everything is shown in a single low-dimensional environment with clean stages. In deeper RL or RLHF settings the parameter and policy paths are usually more entangled, and it is not obvious that susceptibilities will surface features that careful policy analysis or standard metrics would miss. There are no scaling results or comparisons to other interpretability tools, so the practical payoff remains speculative. This is the sort of paper that belongs in a reading group for people working on RL interpretability or alignment. It shows clear thinking about how to port an existing method and supplies a concrete check, even if the evidence is still narrow. I would send it to peer review rather than desk-reject; the idea is worth referee scrutiny and the authors can be asked to address the generalization question directly.

Referee Report

2 major / 2 minor

Summary. The paper generalizes susceptibilities—a technique that studies the response of posterior expectation values of observables to perturbations of the loss—to the setting of regret in deep reinforcement learning. Using a simple gridworld model that exhibits non-trivial stagewise development, the authors argue that susceptibilities reveal internal features of model development in parameter space that cannot be detected purely by studying the learned policy. Results are validated via activation-steering, and potential extensions to RLHF post-training are discussed.

Significance. If the susceptibilities approach proves robust, it could supply a new interpretability lens for RL that distinguishes parameter-space developmental trajectories from observable policy behavior, with possible utility for diagnosing training dynamics in RLHF. The toy gridworld allows controlled demonstration of stagewise effects, but the significance hinges on whether the method isolates genuinely hidden features beyond standard regret or policy metrics.

major comments (2)

[Gridworld Experiments] The central claim—that susceptibilities detect parameter-space features invisible to policy analysis—rests on the gridworld results, yet the manuscript provides no quantitative comparison (e.g., mutual information or divergence metrics) between susceptibility-derived features and those obtainable from policy trajectories or standard RL diagnostics such as per-stage regret curves.
[Validation and Activation Steering] Activation-steering validation is performed exclusively within the same toy gridworld; this does not address whether the technique isolates information beyond what careful inspection of the learned policy or conventional RL metrics already reveal in high-dimensional deep RL or RLHF regimes where parameter and policy trajectories are more entangled.

minor comments (2)

[Abstract] The abstract states the generalization and gridworld results but supplies no equations, quantitative metrics, error bars, or details on how susceptibilities are computed for regret; adding these would improve readability.
[Discussion] The discussion of extension to RLHF post-training is high-level; concrete challenges (e.g., scaling of posterior expectations or choice of observables) or a small-scale RLHF pilot would clarify feasibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and note planned changes to the manuscript.

read point-by-point responses

Referee: [Gridworld Experiments] The central claim—that susceptibilities detect parameter-space features invisible to policy analysis—rests on the gridworld results, yet the manuscript provides no quantitative comparison (e.g., mutual information or divergence metrics) between susceptibility-derived features and those obtainable from policy trajectories or standard RL diagnostics such as per-stage regret curves.

Authors: We agree that the current presentation relies primarily on qualitative comparison and visualization. In the revised manuscript we will add explicit quantitative comparisons, including mutual information between susceptibility maps and policy-derived features as well as divergence metrics that contrast stage detection from susceptibilities against per-stage regret curves. These additions will directly quantify the incremental information provided by the susceptibilities approach. revision: yes
Referee: [Validation and Activation Steering] Activation-steering validation is performed exclusively within the same toy gridworld; this does not address whether the technique isolates information beyond what careful inspection of the learned policy or conventional RL metrics already reveal in high-dimensional deep RL or RLHF regimes where parameter and policy trajectories are more entangled.

Authors: The gridworld was selected precisely because it permits controlled observation of stagewise parameter-space development that remains hidden under policy inspection. Activation steering is used to validate the susceptibilities within this transparent setting. The manuscript does not claim or provide empirical results for high-dimensional RLHF; we will revise the discussion to state this scope limitation more explicitly while retaining the toy-model demonstration as a proof of concept. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines susceptibilities via the response of posterior expectations to loss perturbations, explicitly generalizes the construction to regret, and then empirically demonstrates its utility on a gridworld with stagewise development. The claim that susceptibilities detect parameter-space features invisible to policy inspection is supported by direct comparison in the toy setting plus independent activation-steering validation, not by any reduction to fitted parameters or self-referential definitions. No load-bearing step equates a prediction to its own inputs by construction, and no self-citation chain is invoked to force uniqueness.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the assumption that susceptibilities defined via loss perturbations can be meaningfully transferred to regret perturbations, and that the chosen gridworld exhibits representative stagewise development. No explicit free parameters or invented entities are mentioned.

axioms (1)

domain assumption The response of posterior expectation values to perturbations of the loss can be generalized to perturbations of the regret in deep RL.
This is the core generalization stated in the abstract; its validity is not derived but assumed for the gridworld experiments.

pith-pipeline@v0.9.0 · 5389 in / 1356 out tokens · 49724 ms · 2026-05-11T02:43:55.363631+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We generalize this construction to the setting of the regret in deep reinforcement learning and investigate the utility of susceptibilities in a simple gridworld model...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
the LLC estimator tracks this stagewise development. Phase transitions are accompanied by rapid increases in the LLC...

Reference graph

Works this paper leans on

122 extracted references · 122 canonical work pages · 1 internal anchor

[1]

2026 , eprint=

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape , author=. 2026 , eprint=

work page 2026
[2]

2026 , url =

Maxim Massenkoff and Peter McCrory , title =. 2026 , url =

work page 2026
[3]

Neural Computation , volume=

Improving Generalization for Temporal Difference Learning: The Successor Representation , author=. Neural Computation , volume=. 1993 , publisher=

work page 1993
[4]

2023 , eprint=

Understanding and Controlling a Maze-Solving Policy Network , author=. 2023 , eprint=

work page 2023
[5]

2025 , eprint=

From Lists to Emojis: How Format Bias Affects Model Alignment , author=. 2025 , eprint=

work page 2025
[6]

2022 , eprint=

Scaling Laws for Reward Model Overoptimization , author=. 2022 , eprint=

work page 2022
[7]

2023 , eprint=

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback , author=. 2023 , eprint=

work page 2023
[8]

, year =

Omohundro, Stephen M. , year =. The Basic. Artificial Intelligence Safety and Security , pages =. doi:10.1201/9781351251389-3 , keywords =

work page doi:10.1201/9781351251389-3
[9]

2024 , eprint=

Cooperative Inverse Reinforcement Learning , author=. 2024 , eprint=

work page 2024
[10]

2023 , month = apr, day =

Kosoy, Vanessa , title =. 2023 , month = apr, day =

work page 2023
[11]

2026 , eprint=

Reinforcement Learning from Human Feedback , author=. 2026 , eprint=

work page 2026
[12]

Proceedings of the 39th International Conference on Machine Learning , pages =

Goal Misgeneralization in Deep Reinforcement Learning , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

work page 2022
[13]

2024 , eprint=

Foundational Challenges in Assuring Alignment and Safety of Large Language Models , author=. 2024 , eprint=

work page 2024
[14]

2009 , series=

Algebraic geometry and statistical learning theory , author=. 2009 , series=

work page 2009
[15]

Bayes theory , pages=

Asymptotic normality of posterior distributions , author=. Bayes theory , pages=. 1983 , publisher=

work page 1983
[16]

2007 IEEE Symposium on Foundations of Computational Intelligence , pages=

Almost all learning machines are singular , author=. 2007 IEEE Symposium on Foundations of Computational Intelligence , pages=. 2007 , organization=

work page 2007
[17]

Mathematical theory of

Watanabe, Sumio , year=. Mathematical theory of

work page
[18]

A widely applicable

Watanabe, Sumio , journal=. A widely applicable. 2013 , publisher=

work page 2013
[19]

Liam Carroll , title =

work page
[20]

The 28th International Conference on Artificial Intelligence and Statistics , year=

The Local Learning Coefficient: A Singularity-Aware Complexity Measure , author=. The 28th International Conference on Artificial Intelligence and Statistics , year=

work page
[21]

ICML 2024 Workshop on Mechanistic Interpretability , year=

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability , author=. ICML 2024 Workshop on Mechanistic Interpretability , year=

work page 2024
[22]

The Thirteenth International Conference on Learning Representations , year=

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[23]

Higher and derived stacks: A global overview , Volume =

To. Higher and derived stacks: A global overview , Volume =. Proc. Sympos. Pure Math , Pages =

work page
[24]

Derived algebraic geometry , Url =

To. Derived algebraic geometry , Url =. EMS Surv. Math. Sci. , Mrclass =. 2014 , Bdsk-Url-1 =. doi:10.4171/EMSS/4 , Fjournal =

work page doi:10.4171/emss/4 2014
[25]

2017 , PAGES =

Gaitsgory, Dennis and Rozenblyum, Nick , TITLE =. 2017 , PAGES =

work page 2017
[26]

A study in derived algebraic geometry

Gaitsgory, Dennis and Rozenblyum, Nick , Isbn =. A study in derived algebraic geometry

work page
[27]

The moduli space of curves , pages=

Enumeration of rational curves via torus actions , author=. The moduli space of curves , pages=. 1995 , publisher=

work page 1995
[28]

Compositio Mathematica , volume=

Contact loci in arc spaces , author=. Compositio Mathematica , volume=. 2004 , publisher=

work page 2004
[29]

Inventiones Mathematicae , volume=

Jet schemes of locally complete intersection canonical singularities , author=. Inventiones Mathematicae , volume=. 2001 , publisher=

work page 2001
[30]

Higher Deformation Quantization for

Elliott, Chris and Gwilliam, Owen and Williams, Brian R , journal =. Higher Deformation Quantization for

work page
[31]

2011 , url =

Impanga Lecture notes on log canonical thresholds , author =. 2011 , url =

work page 2011
[32]

Popa, Mihnea , year =. The

work page
[33]

Pridham, J. P. , TITLE =. Adv. Math. , FJOURNAL =. 2010 , NUMBER =. doi:10.1016/j.aim.2009.12.009 , URL =

work page doi:10.1016/j.aim.2009.12.009 2010
[34]

Derived Algebraic Geometry

Lurie, Jacob , Journal =. Derived Algebraic Geometry

work page
[35]

, TITLE =

Kapranov, M. , TITLE =. Compositio Math. , FJOURNAL =. 1999 , NUMBER =. doi:10.1023/A:1000664527238 , URL =

work page doi:10.1023/a:1000664527238 1999
[36]

Elliott, P

Elliott, Chris and Safronov, Pavel and Williams, Brian R. , TITLE =. Selecta Math. (N.S.) , FJOURNAL =. 2022 , NUMBER =. doi:10.1007/s00029-022-00786-y , URL =

work page doi:10.1007/s00029-022-00786-y 2022
[37]

Renormalization and Effective Field Theory , Volume =

Kevin Costello , Optseries =. Renormalization and Effective Field Theory , Volume =

work page
[38]

Factorization algebras in quantum field theory

Costello, Kevin and Gwilliam, Owen , Date-Added =. Factorization algebras in quantum field theory. Vol. 2 , Url =. 2018 , Bdsk-Url-1 =

work page 2018
[39]

Homotopy over the complex numbers and generalized de

Simpson, Carlos , Journal =. Homotopy over the complex numbers and generalized de

work page
[40]

Simpson, Carlos and Teleman, Constantin , Journal =. De

work page
[41]

Simpson, Carlos , Booktitle =. The

work page
[42]

Dynamical versus

Zhongtian Chen and Edmund Lau and Jake Mendel and Susan Wei and Daniel Murfet , year=. Dynamical versus

work page
[43]

2025 , eprint=

Dynamics of Transient Structure in In-Context Linear Regression Transformers , author=. 2025 , eprint=

work page 2025
[44]

Mitigating Goal Misgeneralization via Minimax Regret , year =

Abdel Sadek, Karim and Farrugia-Roberts, Matthew and Erlebach, Hannah and de Witt, Christian Schroeder and Krueger, David and Anwar, Usman and Dennis, Michael D , booktitle=. Mitigating Goal Misgeneralization via Minimax Regret , year =

work page
[45]

EWRL23 , OPTseries =

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL , author =. EWRL23 , OPTseries =

work page
[46]

Equivalence between policy gradients and soft Q-learning

John Schulman and Xi Chen and Pieter Abbeel , year=. Equivalence between policy gradients and soft. 1704.06440 , archivePrefix=

work page arXiv
[47]

, author=

Maximum entropy inverse reinforcement learning. , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2008 , organization=

work page 2008
[48]

2018 , eprint=

Reinforcement learning and control as probabilistic inference: Tutorial and review , author=. 2018 , eprint=

work page 2018
[49]

Asymptotic freedom in the. J. Geom. Phys. , Mrclass =. 2018 , Bdsk-Url-1 =. doi:10.1016/j.geomphys.2017.08.009 , Eprint =

work page doi:10.1016/j.geomphys.2017.08.009 2018
[50]

Applied Mathematical Sciences , volume=

Information Geometry and Its Applications , author=. Applied Mathematical Sciences , volume=. 2016 , publisher=

work page 2016
[51]

Probabilistic inference for solving discrete and continuous state

Toussaint, Marc and Storkey, Amos , booktitle=. Probabilistic inference for solving discrete and continuous state

work page
[52]

Physical Review X , volume=

RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions , author=. Physical Review X , volume=. 2025 , publisher=

work page 2025
[53]

The Thirteenth International Conference on Learning Representations , year=

Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[54]

Neurocomputing , volume=

Asymptotic behavior of free energy when optimal probability distribution is not unique , author=. Neurocomputing , volume=. 2022 , publisher=

work page 2022
[55]

Exponential convergence of

Roberts, Gareth O and Tweedie, Richard L , journal=. Exponential convergence of

work page
[56]

Resolution of singularities of an algebraic variety over a field of characteristic zero:

Hironaka, Heisuke , journal=. Resolution of singularities of an algebraic variety over a field of characteristic zero:

work page
[57]

Bayesian learning via stochastic gradient

Welling, Max and Teh, Yee W , booktitle=. Bayesian learning via stochastic gradient

work page
[58]

arXiv preprint 2507.21449 , year=

From Global to Local: A Scalable Benchmark for Local Posterior Sampling , author=. arXiv preprint 2507.21449 , year=

work page arXiv
[59]

Neural Computation , volume =

Balasubramanian, Vijay , title =. Neural Computation , volume =. 1997 , month =. doi:10.1162/neco.1997.9.2.349 , url =

work page doi:10.1162/neco.1997.9.2.349 1997
[60]

Physical review A , volume=

Statistical mechanics of learning from examples , author=. Physical review A , volume=. 1992 , publisher=

work page 1992
[61]

Advances in Neural Information Processing Systems , volume=

The promises and pitfalls of stochastic gradient Langevin dynamics , author=. Advances in Neural Information Processing Systems , volume=

work page
[62]

The Journal of Machine Learning Research , volume=

Consistency and fluctuations for stochastic gradient Langevin dynamics , author=. The Journal of Machine Learning Research , volume=. 2016 , publisher=

work page 2016
[63]

Proceedings of the 39th International Conference on Machine Learning , pages =

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

work page 2022
[64]

and Hutter, Marcus and Osborne, Michael A

Cohen, Michael K. and Hutter, Marcus and Osborne, Michael A. , title =. AI Magazine , volume =. doi:https://doi.org/10.1002/aaai.12064 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/aaai.12064 , abstract =

work page doi:10.1002/aaai.12064
[65]

ArXiv , year=

Foundations of Reinforcement Learning and Interactive Decision Making , author=. ArXiv , year=

work page
[66]

Michael Munn and Wei, Susan , booktitle=. A. 2025 , url=

work page 2025
[67]

General duality between optimal control and estimation , year=

Todorov, Emanuel , booktitle=. General duality between optimal control and estimation , year=

work page
[68]

A new approach to linear filtering and prediction problems , journal=

Kalman, Rudolph Emil , year=. A new approach to linear filtering and prediction problems , journal=

work page
[69]

ArXiv , year=

In-Context Learning Strategies Emerge Rationally , author=. ArXiv , year=

work page
[70]

High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning , year=

Loss landscape geometry reveals stagewise development of transformers , author=. High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning , year=

work page 2024
[71]

Bissiri, P. G. and Holmes, C. C. and Walker, S. G. , title =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =. doi:https://doi.org/10.1111/rssb.12158 , url =. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12158 , year =

work page doi:10.1111/rssb.12158
[72]

Zhang, Tong , journal=. From. 2006 , publisher=

work page 2006
[73]

International Conference on Optimization and Learning , pages=

Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning , author=. International Conference on Optimization and Learning , pages=. 2024 , organization=

work page 2024
[74]

International Conference on Machine Learning , year=

Model-agnostic Measure of Generalization Difficulty , author=. International Conference on Machine Learning , year=

work page
[75]

ArXiv , year=

Trajectory Entropy Reinforcement Learning for Predictable and Robust Control , author=. ArXiv , year=

work page
[76]

The Pitfalls of Simplicity Bias in Neural Networks , booktitle =

Harshay Shah and Kaustav Tamuly and Aditi Raghunathan and Prateek Jain and Praneeth Netrapalli , editor =. The Pitfalls of Simplicity Bias in Neural Networks , booktitle =. 2020 , url =

work page 2020
[77]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Do we always need the simplicity bias? Looking for optimal inductive biases in the wild , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[78]

ArXiv , year=

Deep learning generalizes because the parameter-function map is biased towards simple functions , author=. ArXiv , year=

work page
[79]

2014 , publisher =

Superintelligence: Paths, Dangers, Strategies , author =. 2014 , publisher =

work page 2014
[80]

International Conference on Learning Representations , year=

Logic and the 2-Simplicial Transformer , author=. International Conference on Learning Representations , year=

work page

Showing first 80 references.