arxiv: 2602.19837 · v3 · submitted 2026-02-23 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent

Bj\"orn Hoppmann , Christoph Scholz

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:42 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords meta-learningmeta-reinforcement learningAdaptive Agenttask-based formalizationsurveygeneralist AIrapid adaptation

0 comments

The pith

A task-based formalization of meta-learning organizes the algorithms leading to DeepMind's Adaptive Agent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey defines meta-learning and meta-reinforcement learning through a consistent task-based lens, where models acquire knowledge transferable across a distribution of tasks to enable fast adaptation on new ones. It then applies this definition to review the sequence of landmark algorithms that built the capabilities seen in the Adaptive Agent. A reader would care because the formalization explains how AI systems can move beyond task-specific training to handle novel challenges with minimal new data, much like humans do. The result consolidates the conceptual steps required for generalist agents that adapt efficiently.

Core claim

The paper provides a rigorous task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

What carries the argument

The task-based formalization, which structures meta-learning problems around distributions of tasks to capture transferable knowledge for rapid adaptation to new tasks.

If this is right

The formalization supplies a consistent structure for comparing different meta-learning methods.
It isolates the conceptual milestones required to reach generalist adaptation capabilities.
Future algorithm design can be guided by the progression identified in the chronicle.
The approach highlights how prior task experience enables efficient handling of novel tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same formalization could be applied to evaluate adaptation in current large-scale models outside the surveyed lineage.
It suggests a way to design benchmarks that test transferable knowledge across explicitly defined task distributions.
Extending the chronicle might reveal under-explored branches in meta-RL that connect to other adaptation techniques.

Load-bearing premise

The selected landmark algorithms accurately represent the essential conceptual path to the Adaptive Agent without major omissions or selection bias.

What would settle it

Identifying a major algorithm essential to the Adaptive Agent's development that cannot be described within the task-based formalization or was omitted from the chronicle.

Figures

Figures reproduced from arXiv: 2602.19837 by Bj\"orn Hoppmann, Christoph Scholz.

**Figure 2.** Figure 2: General Meta-Training. In each iteration a new task [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: General Meta-Testing Paradigm. For each Tj sampled from the set of test tasks the parameters θj (φ) are fine-tuned in K shots, before the resulting θ ′ j get evaluated on the task’s test set via the task-specific loss Lj to yield the performance. numbers N, Nval and Ntest of training, validation, and testing tasks are themselves hyperparameters of meta-training. After meta-training a meta-model fφ, meta-te… view at source ↗

**Figure 4.** Figure 4: Meta-Reinforcement learning to race on tracks with varying weather conditions. Starting from the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The MAML meta-training scheme. that MAML learns effective feature representations rather than a rapidly adaptable prior: During taskspecific training, the initial layers of the underlying network exhibit minimal changes, suggesting that the fundamental feature representations remain stable. Probably, this is why, MAML is more stable than other landmarks, especialy VariBAD (see Section 3.3), which is illus… view at source ↗

**Figure 6.** Figure 6: General Multi-Task-Learning Paradigm: Multi-Task Learning Paradigm Formally, in MTL a model is jointly trained to solve a given number N of tasks (T1, T2, . . . , Tn) sharing some common structure. The corresponding training process mimics standard learning on task-specific level, but with parameters being shared throughout the tasks. A meta-level does not exist and hence there is no two-staged training pr… view at source ↗

**Figure 7.** Figure 7: The PEARL meta-training scheme. In each MDP [PITH_FULL_IMAGE:figures/full_fig_p042_7.png] view at source ↗

read the original abstract

Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward survey that formalizes meta-learning around tasks and traces selected algorithms to DeepMind's Adaptive Agent, but it adds no new results or proofs.

read the letter

The paper's core move is to lay out a task-based formalization of meta-learning and meta-RL, then use it to walk through a sequence of prior algorithms that lead toward DeepMind's Adaptive Agent. That framing is the main thing it contributes: a single lens for connecting ideas that often sit in separate papers. It does a decent job of citing the relevant literature and keeping the narrative focused on adaptation with limited data, which is the practical payoff of the field. Readers who want a compact map of how these methods evolved will probably find it useful as a starting point. The formalization itself looks like an attempt to make the concepts more precise without introducing new math, which keeps the paper grounded in existing work. The main limitation is that it remains a survey. There are no fresh experiments, no new theorems, and no quantitative claims to test, so its value rests entirely on how clearly and completely it organizes what is already known. The choice of which algorithms count as landmarks could reflect the authors' view of the path, and without exhaustive coverage it is hard to rule out gaps. That is a normal risk for this type of paper rather than a fatal flaw. The work is aimed at people entering meta-learning or trying to understand the conceptual steps behind generalist agents. Specialists already familiar with the cited papers will not gain much. It is worth sending to peer review because a clear, well-referenced consolidation can still help the community even when it does not break new ground.

Referee Report

0 major / 2 minor

Summary. The paper provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating essential concepts for understanding generalist approaches.

Significance. If the formalization is precise and the chronicle representative, the survey would consolidate key ideas in meta-learning, serving as a useful reference for tracing progression toward adaptive agents and highlighting transferable knowledge acquisition across tasks.

minor comments (2)

[Abstract] Abstract: the claim of a 'rigorous' formalization requires explicit mathematical definitions of the task-based paradigm (e.g., task distribution, adaptation objective) in §2 or §3 to permit independent verification.
[Chronicle of algorithms] Section on algorithm selection: state explicit inclusion criteria for landmark algorithms to address potential selection bias in the path to the Adaptive Agent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The report highlights the value of our task-based formalization and chronicle of algorithms leading to DeepMind's Adaptive Agent. No specific major comments were provided in the report, so we have no individual points to address at this time. We are happy to incorporate any minor suggestions if they are supplied in a subsequent round.

Circularity Check

0 steps flagged

Survey paper exhibits no circularity: formalization and chronicle rest on external citations

full rationale

This is a survey paper whose central contribution is a task-based formalization of meta-learning/meta-RL together with a chronological narrative of selected prior algorithms leading to DeepMind's Adaptive Agent. No original theorems, empirical predictions, fitted parameters, or derivations are asserted that could reduce to the paper's own inputs by construction. All landmark algorithms and concepts are explicitly referenced to prior published work by other authors. The selection of landmarks is presented as consolidation rather than a proof, so no load-bearing self-citation or self-definitional loop exists. The paper is self-contained against external benchmarks via its citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey relies on standard definitions from machine learning literature for tasks and learning processes; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Task-based formalization of learning processes is a valid and useful organizing principle for meta-learning.
The abstract states that the survey uses a task-based formalization to structure the field.

pith-pipeline@v0.9.0 · 5391 in / 1021 out tokens · 22447 ms · 2026-05-15T20:42:05.842459+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

208 extracted references · 208 canonical work pages · 16 internal anchors

[1]

Multitask Learning.Machine Learning, 28(1)

work page
[2]

The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018

Jay Alammar. The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018. Accessed: 2025-07-01

work page 2018
[3]

Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling

Ferran Alet, Martin F. Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling. Meta-learning curiosity algorithms, March 2020. ADS Bibcode: 2020arXiv200305325A

work page 2020
[4]

Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S

Aqilah M. Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S. Albakheet, and Surbhi Bhatia Khan. Few-Shot Learning for Medical Image Segmentation Using 3D U-Net and Model-Agnostic Meta-Learning (MAML).Diagnostics, 14(12):1213, January 2024. Number: 12 Publisher: Multi- disciplinary Digital Publishing Institute

work page 2024
[5]

Zeilinger

ElenaArcari, MariaVittoriaMinniti, AnnaScampicchio, AndreaCarron, FarbodFarshidian, Marco Hutter, and Melanie N. Zeilinger. Bayesian Multi-Task Learning MPC for Robotic Mobile Manip- ulation.IEEE Robotics and Automation Letters, 8(6):3222–3229, June 2023

work page 2023
[6]

When MAML Can Adapt Fast and How to Assist When It Cannot

Sébastien Arnold, Shariq Iqbal, and Fei Sha. When MAML Can Adapt Fast and How to Assist When It Cannot. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 244–252. PMLR, March 2021. ISSN: 2640-3498

work page 2021
[7]

Lobo, Pablo Garcia-Bringas, and Javier Del Ser

Marcos Barcina-Blanco, Jesus L. Lobo, Pablo Garcia-Bringas, and Javier Del Ser. Managing the unknown in machine learning: Definitions, related areas, recent advances, and prospects.Neuro- computing, 599:128073, September 2024

work page 2024
[8]

Hypernetworks in Meta-Reinforcement Learning

Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, and Shimon Whiteson. Hypernetworks in Meta-Reinforcement Learning. InProceedings of The 6th Conference on Robot Learning, pages 1478–1487. PMLR, March 2023. ISSN: 2640-3498

work page 2023
[9]

A Survey of Meta-Reinforcement Learning, January 2023

JacobBeck, RistoVuorio, EvanZheranLiu, ZhengXiong, LuisaZintgraf, ChelseaFinn, andShimon Whiteson. A Survey of Meta-Reinforcement Learning, January 2023. arXiv:2301.08028 [cs]

work page arXiv 2023
[10]

Harkirat Singh Behl, Atılım Gueneş Baydin, and Philip H. S. Torr. Alpha MAML: Adaptive Model-Agnostic Meta-Learning, May 2019. arXiv:1905.07435 [cs]

work page arXiv 2019
[11]

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents.Journal of Artificial Intelligence Research, 47:253–279, June 2013

work page 2013
[12]

Ketz, Praveen K

Eseoghene Ben-Iwhiwhu, Jeffery Dick, Nicholas A. Ketz, Praveen K. Pilly, and Andrea Soltoggio. Context meta-reinforcement learning via neuromodulation.Neural Networks, 152:70–79, August 2022

work page 2022
[13]

Bharat Rao

Jinbo Bi, Tao Xiong, Shipeng Yu, Murat Dundar, and R. Bharat Rao. An Improved Multi-task Learning Approach with Applications in Medical Diagnosis. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors,Machine Learning and Knowledge Discovery in Databases, pages 117–132, Berlin, Heidelberg, 2008. Springer

work page 2008
[14]

Context-Based Meta-Reinforcement Learn- ing With Bayesian Nonparametric Models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6948–6965, October 2024

Zhenshan Bing, Yuqi Yun, Kai Huang, and Alois Knoll. Context-Based Meta-Reinforcement Learn- ing With Bayesian Nonparametric Models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6948–6965, October 2024

work page 2024
[15]

Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks

Benjamin Bischke, Patrick Helber, Joachim Folz, Damian Borth, and Andreas Dengel. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. In2019 IEEE International Conference on Image Processing (ICIP), pages 1480–1484, September 2019. ISSN: 2381-8549

work page 2019
[16]

Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shya- mal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...

work page 2021
[17]

Semi-Supervised Few-Shot Learning with MAML

Rinu Boney and Alexander Ilin. Semi-Supervised Few-Shot Learning with MAML. January 2018

work page 2018
[18]

Language Models are Few-Shot Learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott ...

work page 1901
[19]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Visibility into AI Agents

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, NitarshanRajkumar, DavidKrueger, NoamKolt, LennartHeim, andMarkusAnderljung. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, pages 958–973, New York, NY, USA, June 2024. Association fo...

work page 2024
[21]

Rashid, and Kuldip K

Mayank Chaturvedi, Mahmood A. Rashid, and Kuldip K. Paliwal. Transformers in RNA structure prediction: A review.Computational and Structural Biotechnology Journal, 27:1187–1203, January 2025

work page 2025
[22]

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.ACM Comput

Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, and Bruno Castro da Silva. RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.ACM Comput. Surv., June

work page
[23]

MAML MOT: Multiple Object Tracking Based on Meta-Learning

Jiayi Chen and Chunhua Deng. MAML MOT: Multiple Object Tracking Based on Meta-Learning. In2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 4542– 4547, October 2024

work page 2024
[24]

Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput

Shijie Chen, Yu Zhang, and Qiang Yang. Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput. Surv., 56(12):295:1–295:32, July 2024

work page 2024
[25]

Meta-LSTR: Meta-Learning with Long Short-Term Transformer for futures volatility prediction.Expert Systems with Applications, 265:125926, March 2025

Yunzhu Chen, Neng Ye, Wenyu Zhang, Jiaqi Fan, Shahid Mumtaz, and Xiangming Li. Meta-LSTR: Meta-Learning with Long Short-Term Transformer for futures volatility prediction.Expert Systems with Applications, 265:125926, March 2025

work page 2025
[26]

AI-GAs: AI-generatingalgorithms, analternateparadigmforproducinggeneralartificial intelligence, January 2020

JeffClune. AI-GAs: AI-generatingalgorithms, analternateparadigmforproducinggeneralartificial intelligence, January 2020. arXiv:1905.10985 [cs]. 25

work page arXiv 2020
[27]

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 2978–2988, Florence, Italy, July 2019. ...

work page 2019
[28]

The Development of a Learning Strategies Curriculum1

DONALD Dansereau. The Development of a Learning Strategies Curriculum1. In HAROLD F. O’neil, editor,Learning Strategies, pages 1–29. Academic Press, January 1978

work page 1978
[29]

Multi-task policy search for robotics

Marc Peter Deisenroth, Peter Englert, Jan Peters, and Dieter Fox. Multi-task policy search for robotics. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3876–3881, May 2014. ISSN: 1050-4729

work page 2014
[30]

Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021

work page 2021
[31]

Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024

Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, and Jan Peters. Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024. arXiv:2401.09561 [cs]

work page arXiv 2024
[32]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

work page 2019
[33]

Offline Meta Reinforcement Learning – Identifi- ability Challenges and Effective Data Collection Strategies

Ron Dorfman, Idan Shenfeld, and Aviv Tamar. Offline Meta Reinforcement Learning – Identifi- ability Challenges and Effective Data Collection Strategies. InAdvances in Neural Information Processing Systems, volume 34, pages 4607–4618. Curran Associates, Inc., 2021

work page 2021
[34]

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning, November 2016. arXiv:1611.02779 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

Worrawat Duanyai, Weon Keun Song, Poom Konghuayrob, and Manukid Parnichkun. Event- triggered model reference adaptive control system design for SISO plants using meta-learning- based physics-informed neural networks without labeled data and transfer learning.Interna- tional Journal of Adaptive Control and Signal Processing, 38(4):1442–1456, 2024. _eprint:...

work page doi:10.1002/acs.3758 2024
[36]

Neural Architecture Search: A Survey

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55):1–21, 2019

work page 2019
[37]

Hospedales

Linus Ericsson, Henry Gouk, Chen Change Loy, and Timothy M. Hospedales. Self-Supervised Rep- resentation Learning: Introduction, advances, and challenges.IEEE Signal Processing Magazine, 39(3):42–62, May 2022

work page 2022
[38]

Novoa, Justin Ko, Susan M

Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, February 2017. Publisher: Nature Publishing Group

work page 2017
[39]

Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, and Alexander J. Smola. Meta-Q-Learning, April

work page
[40]

arXiv:1910.00125 [cs, stat]

work page arXiv 1910
[41]

On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms

Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 1082–1092. PMLR, June 2020. ISSN: 2640-3498

work page 2020
[42]

Short-term Load Forecasting of Distribution Transformer Supply Zones Based on Federated Model-Agnostic Meta Learning.IEEE Transactions on Power Systems, 40(1):31–45, January 2025

Changsen Feng, Liang Shao, Jiaying Wang, Youbing Zhang, and Fushuan Wen. Short-term Load Forecasting of Distribution Transformer Supply Zones Based on Federated Model-Agnostic Meta Learning.IEEE Transactions on Power Systems, 40(1):31–45, January 2025. 26

work page 2025
[43]

Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1126–1135. PMLR, August 2017

work page 2017
[44]

Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

Chelsea Finn and Sergey Levine. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm, February 2018. arXiv:1710.11622 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Probabilistic Model-Agnostic Meta-Learning

Chelsea Finn, Kelvin Xu, and Sergey Levine. Probabilistic Model-Agnostic Meta-Learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

work page 2018
[46]

One-Shot Visual Imitation Learning via Meta-Learning

Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-Shot Visual Imitation Learning via Meta-Learning. InProceedings of the 1st Annual Conference on Robot Learning, pages 357–368. PMLR, October 2017. ISSN: 2640-3498

work page 2017
[47]

Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, April 2025

Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, and Mac Schwager. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, April 2025. Pu...

work page 2025
[48]

DiCE: The Infinitely Differentiable Monte Carlo Estimator

Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktaeschel, Eric Xing, and Shimon Whiteson. DiCE: The Infinitely Differentiable Monte Carlo Estimator. InProceedings of the 35th International Conference on Machine Learning, pages 1529–1538. PMLR, July 2018. ISSN: 2640- 3498

work page 2018
[49]

An Introduction to Deep Reinforcement Learning

Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, and Joelle Pineau. An Introduction to Deep Reinforcement Learning.Foundations and Trends in Machine Learning, 11(3-4):219–354, 2018. arXiv:1811.12560 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[50]

MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

Yuan Gao, Haoping Bai, Zequn Jie, Jiayi Ma, Kui Jia, and Wei Liu. MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning. pages 11543–11552, 2020

work page 2020
[51]

Adaptive guidance and integrated navigation with reinforcement meta-learning.Acta Astronautica, 169:180–190, April 2020

Brian Gaudet, Richard Linares, and Roberto Furfaro. Adaptive guidance and integrated navigation with reinforcement meta-learning.Acta Astronautica, 169:180–190, April 2020

work page 2020
[52]

Hassan Gharoun, Fereshteh Momenifar, Fang Chen, and Amir H. Gandomi. Meta-learning Ap- proaches for Few-Shot Learning: A Survey of Recent Advances.ACM Comput. Surv., 56(12):294:1– 294:41, July 2024

work page 2024
[53]

Bayesian Reinforcement Learning: A Survey

Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey.ArXiv, abs/1609.04436, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[54]

Deep sparse rectifier neural networks

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors,Proceedings of the Fourteenth Inter- national Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2...

work page 2011
[55]

Maybank, and Dacheng Tao

Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. Knowledge Distillation: A Survey.International Journal of Computer Vision, 129(6):1789–1819, June 2021

work page 2021
[56]

Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

ErinGrant, ChelseaFinn, SergeyLevine, TrevorDarrell, andThomasGriffiths. RecastingGradient- Based Meta-Learning as Hierarchical Bayes, January 2018. arXiv:1801.08930 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[57]

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

Jake Grigsby, Linxi Fan, and Yuke Zhu. AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents. October 2023

work page 2023
[58]

A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9052–9071, December 2024

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9052–9071, December 2024. 27

work page 2024
[59]

Meta- Reinforcement Learning of Structured Exploration Strategies

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine. Meta- Reinforcement Learning of Structured Exploration Strategies. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

work page 2018
[60]

Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine

Abhishek Gupta, Justin Yu, Tony Z. Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine. Reset-Free Reinforcement Learning via Multi-Task Learning: Learn- ing Dexterous Manipulation Behaviors without Human Intervention. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6664–6671, May 2021. ISSN: 2577-087X

work page 2021
[61]

Extending the Capabilities of Re- inforcement Learning Through Curriculum: A Review of Methods and Applications.SN Computer Science, 3(1):28, October 2021

Kashish Gupta, Debasmita Mukherjee, and Homayoun Najjaran. Extending the Capabilities of Re- inforcement Learning Through Curriculum: A Review of Methods and Applications.SN Computer Science, 3(1):28, October 2021

work page 2021
[62]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–1...

work page 2018
[63]

Benchmarking the Spectrum of Agent Capabilities

Danijar Hafner. Benchmarking the Spectrum of Agent Capabilities. December 2021

work page 2021
[64]

A Survey on Vision Transformer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87– 110, January 2023

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao. A Survey on Vision Transformer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87– 110, January 2023

work page 2023
[65]

Kai He, Nan Pu, Mingrui Lao, and Michael S. Lew. Few-shot and meta-learning methods for image understanding: a survey.International Journal of Multimedia Information Retrieval, 12(2):14, June 2023

work page 2023
[66]

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.CoRR, abs/1502.01852, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[67]

Muesli: Combining Improvements in Policy Optimiza- tion

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, and Hado Van Hasselt. Muesli: Combining Improvements in Policy Optimiza- tion. InProceedings of the 38th International Conference on Machine Learning, pages 4214–4226. PMLR, July 2021. ISSN: 2640-3498

work page 2021
[68]

Distilling the knowledge in a neural network, 2015

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network, 2015

work page 2015
[69]

Meta-Learning in Neural Networks: ASurvey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149– 5169, September 2022

Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-Learning in Neural Networks: ASurvey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149– 5169, September 2022. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

work page 2022
[70]

On Transforming Reinforcement LearningWithTransformers: TheDevelopmentTrajectory.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8580–8599, December 2024

Shengchao Hu, Li Shen, Ya Zhang, Yixin Chen, and Dacheng Tao. On Transforming Reinforcement LearningWithTransformers: TheDevelopmentTrajectory.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8580–8599, December 2024

work page 2024
[71]

Toward Human- Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker

Zhongxu Hu, Yiran Zhang, Yang Xing, Yifan Zhao, Dongpu Cao, and Chen Lv. Toward Human- Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker. IEEE Vehicular Technology Magazine, 17(4):57–64, December 2022

work page 2022
[72]

Improving Transformer Optimiza- tion Through Better Initialization

Xiao Shi Huang, Felipe Perez, Jimmy Ba, and Maksims Volkovs. Improving Transformer Optimiza- tion Through Better Initialization. InProceedings of the 37th International Conference on Machine Learning, pages 4475–4483. PMLR, November 2020. ISSN: 2640-3498

work page 2020
[73]

van Rijn, and Aske Plaat

Mike Huisman, Jan N. van Rijn, and Aske Plaat. A survey of deep meta-learning.Artificial Intelligence Review, 54(6):4483–4541, August 2021

work page 2021
[74]

Un- supervised Curricula for Visual Meta-Reinforcement Learning

Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, and Chelsea Finn. Un- supervised Curricula for Visual Meta-Reinforcement Learning. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. 28

work page 2019
[75]

Prioritized Level Replay

Minqi Jiang, Edward Grefenstette, and Tim Rocktaeschel. Prioritized Level Replay. InProceedings of the 38th International Conference on Machine Learning, pages 4940–4950. PMLR, July 2021. ISSN: 2640-3498

work page 2021
[76]

A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control

Lars Johannsmeier, Malkin Gerchow, and Sami Haddadin. A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control. In2019 International Conference on Robotics and Automation (ICRA), pages 5844–5850, May 2019. ISSN: 2577-087X

work page 2019
[77]

A Survey of Reinforcement Learning from Human Feedback, December 2023

Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke Huellermeier. A Survey of Reinforcement Learning from Human Feedback, December 2023. ADS Bibcode: 2023arXiv231214925K

work page 2023
[78]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

work page 2017
[79]

ImageNet Classification with Deep Con- volutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Con- volutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012

work page 2012
[80]

Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer

Hang Lai, Weinan Zhang, Xialin He, Chen Yu, Zheng Tian, Yong Yu, and Jun Wang. Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5141–5147, May 2023

work page 2023

Showing first 80 references.