pith. machine review for the scientific record. sign in

arxiv: 2602.19837 · v3 · submitted 2026-02-23 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:42 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords meta-learningmeta-reinforcement learningAdaptive Agenttask-based formalizationsurveygeneralist AIrapid adaptation
0
0 comments X

The pith

A task-based formalization of meta-learning organizes the algorithms leading to DeepMind's Adaptive Agent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey defines meta-learning and meta-reinforcement learning through a consistent task-based lens, where models acquire knowledge transferable across a distribution of tasks to enable fast adaptation on new ones. It then applies this definition to review the sequence of landmark algorithms that built the capabilities seen in the Adaptive Agent. A reader would care because the formalization explains how AI systems can move beyond task-specific training to handle novel challenges with minimal new data, much like humans do. The result consolidates the conceptual steps required for generalist agents that adapt efficiently.

Core claim

The paper provides a rigorous task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

What carries the argument

The task-based formalization, which structures meta-learning problems around distributions of tasks to capture transferable knowledge for rapid adaptation to new tasks.

If this is right

  • The formalization supplies a consistent structure for comparing different meta-learning methods.
  • It isolates the conceptual milestones required to reach generalist adaptation capabilities.
  • Future algorithm design can be guided by the progression identified in the chronicle.
  • The approach highlights how prior task experience enables efficient handling of novel tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same formalization could be applied to evaluate adaptation in current large-scale models outside the surveyed lineage.
  • It suggests a way to design benchmarks that test transferable knowledge across explicitly defined task distributions.
  • Extending the chronicle might reveal under-explored branches in meta-RL that connect to other adaptation techniques.

Load-bearing premise

The selected landmark algorithms accurately represent the essential conceptual path to the Adaptive Agent without major omissions or selection bias.

What would settle it

Identifying a major algorithm essential to the Adaptive Agent's development that cannot be described within the task-based formalization or was omitted from the chronicle.

Figures

Figures reproduced from arXiv: 2602.19837 by Bj\"orn Hoppmann, Christoph Scholz.

Figure 1
Figure 1. Figure 1: Meta-learning of 2-way 1-shot animal classification tasks. The current meta-knowledge [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: General Meta-Training. In each iteration a new task [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: General Meta-Testing Paradigm. For each Tj sampled from the set of test tasks the parameters θj (φ) are fine-tuned in K shots, before the resulting θ ′ j get evaluated on the task’s test set via the task-specific loss Lj to yield the performance. numbers N, Nval and Ntest of training, validation, and testing tasks are themselves hyperparameters of meta-training. After meta-training a meta-model fφ, meta-te… view at source ↗
Figure 4
Figure 4. Figure 4: Meta-Reinforcement learning to race on tracks with varying weather conditions. Starting from the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The MAML meta-training scheme. that MAML learns effective feature representations rather than a rapidly adaptable prior: During task￾specific training, the initial layers of the underlying network exhibit minimal changes, suggesting that the fundamental feature representations remain stable. Probably, this is why, MAML is more stable than other landmarks, especialy VariBAD (see Section 3.3), which is illus… view at source ↗
Figure 6
Figure 6. Figure 6: General Multi-Task-Learning Paradigm: Multi-Task Learning Paradigm Formally, in MTL a model is jointly trained to solve a given number N of tasks (T1, T2, . . . , Tn) sharing some common structure. The corresponding training process mimics standard learning on task-specific level, but with parameters being shared throughout the tasks. A meta-level does not exist and hence there is no two-staged training pr… view at source ↗
Figure 7
Figure 7. Figure 7: The PEARL meta-training scheme. In each MDP [PITH_FULL_IMAGE:figures/full_fig_p042_7.png] view at source ↗
read the original abstract

Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating essential concepts for understanding generalist approaches.

Significance. If the formalization is precise and the chronicle representative, the survey would consolidate key ideas in meta-learning, serving as a useful reference for tracing progression toward adaptive agents and highlighting transferable knowledge acquisition across tasks.

minor comments (2)
  1. [Abstract] Abstract: the claim of a 'rigorous' formalization requires explicit mathematical definitions of the task-based paradigm (e.g., task distribution, adaptation objective) in §2 or §3 to permit independent verification.
  2. [Chronicle of algorithms] Section on algorithm selection: state explicit inclusion criteria for landmark algorithms to address potential selection bias in the path to the Adaptive Agent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The report highlights the value of our task-based formalization and chronicle of algorithms leading to DeepMind's Adaptive Agent. No specific major comments were provided in the report, so we have no individual points to address at this time. We are happy to incorporate any minor suggestions if they are supplied in a subsequent round.

Circularity Check

0 steps flagged

Survey paper exhibits no circularity: formalization and chronicle rest on external citations

full rationale

This is a survey paper whose central contribution is a task-based formalization of meta-learning/meta-RL together with a chronological narrative of selected prior algorithms leading to DeepMind's Adaptive Agent. No original theorems, empirical predictions, fitted parameters, or derivations are asserted that could reduce to the paper's own inputs by construction. All landmark algorithms and concepts are explicitly referenced to prior published work by other authors. The selection of landmarks is presented as consolidation rather than a proof, so no load-bearing self-citation or self-definitional loop exists. The paper is self-contained against external benchmarks via its citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey relies on standard definitions from machine learning literature for tasks and learning processes; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Task-based formalization of learning processes is a valid and useful organizing principle for meta-learning.
    The abstract states that the survey uses a task-based formalization to structure the field.

pith-pipeline@v0.9.0 · 5391 in / 1021 out tokens · 22447 ms · 2026-05-15T20:42:05.842459+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

208 extracted references · 208 canonical work pages · 16 internal anchors

  1. [1]

    Multitask Learning.Machine Learning, 28(1)

  2. [2]

    The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018

    Jay Alammar. The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018. Accessed: 2025-07-01

  3. [3]

    Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling

    Ferran Alet, Martin F. Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling. Meta-learning curiosity algorithms, March 2020. ADS Bibcode: 2020arXiv200305325A

  4. [4]

    Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S

    Aqilah M. Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S. Albakheet, and Surbhi Bhatia Khan. Few-Shot Learning for Medical Image Segmentation Using 3D U-Net and Model-Agnostic Meta-Learning (MAML).Diagnostics, 14(12):1213, January 2024. Number: 12 Publisher: Multi- disciplinary Digital Publishing Institute

  5. [5]

    Zeilinger

    ElenaArcari, MariaVittoriaMinniti, AnnaScampicchio, AndreaCarron, FarbodFarshidian, Marco Hutter, and Melanie N. Zeilinger. Bayesian Multi-Task Learning MPC for Robotic Mobile Manip- ulation.IEEE Robotics and Automation Letters, 8(6):3222–3229, June 2023

  6. [6]

    When MAML Can Adapt Fast and How to Assist When It Cannot

    Sébastien Arnold, Shariq Iqbal, and Fei Sha. When MAML Can Adapt Fast and How to Assist When It Cannot. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 244–252. PMLR, March 2021. ISSN: 2640-3498

  7. [7]

    Lobo, Pablo Garcia-Bringas, and Javier Del Ser

    Marcos Barcina-Blanco, Jesus L. Lobo, Pablo Garcia-Bringas, and Javier Del Ser. Managing the unknown in machine learning: Definitions, related areas, recent advances, and prospects.Neuro- computing, 599:128073, September 2024

  8. [8]

    Hypernetworks in Meta-Reinforcement Learning

    Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, and Shimon Whiteson. Hypernetworks in Meta-Reinforcement Learning. InProceedings of The 6th Conference on Robot Learning, pages 1478–1487. PMLR, March 2023. ISSN: 2640-3498

  9. [9]

    A Survey of Meta-Reinforcement Learning, January 2023

    JacobBeck, RistoVuorio, EvanZheranLiu, ZhengXiong, LuisaZintgraf, ChelseaFinn, andShimon Whiteson. A Survey of Meta-Reinforcement Learning, January 2023. arXiv:2301.08028 [cs]

  10. [10]

    Harkirat Singh Behl, Atılım Gueneş Baydin, and Philip H. S. Torr. Alpha MAML: Adaptive Model-Agnostic Meta-Learning, May 2019. arXiv:1905.07435 [cs]

  11. [11]

    M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents.Journal of Artificial Intelligence Research, 47:253–279, June 2013

  12. [12]

    Ketz, Praveen K

    Eseoghene Ben-Iwhiwhu, Jeffery Dick, Nicholas A. Ketz, Praveen K. Pilly, and Andrea Soltoggio. Context meta-reinforcement learning via neuromodulation.Neural Networks, 152:70–79, August 2022

  13. [13]

    Bharat Rao

    Jinbo Bi, Tao Xiong, Shipeng Yu, Murat Dundar, and R. Bharat Rao. An Improved Multi-task Learning Approach with Applications in Medical Diagnosis. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors,Machine Learning and Knowledge Discovery in Databases, pages 117–132, Berlin, Heidelberg, 2008. Springer

  14. [14]

    Context-Based Meta-Reinforcement Learn- ing With Bayesian Nonparametric Models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6948–6965, October 2024

    Zhenshan Bing, Yuqi Yun, Kai Huang, and Alois Knoll. Context-Based Meta-Reinforcement Learn- ing With Bayesian Nonparametric Models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6948–6965, October 2024

  15. [15]

    Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks

    Benjamin Bischke, Patrick Helber, Joachim Folz, Damian Borth, and Andreas Dengel. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. In2019 IEEE International Conference on Image Processing (ICIP), pages 1480–1484, September 2019. ISSN: 2381-8549

  16. [16]

    Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shya- mal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...

  17. [17]

    Semi-Supervised Few-Shot Learning with MAML

    Rinu Boney and Alexander Ilin. Semi-Supervised Few-Shot Learning with MAML. January 2018

  18. [18]

    Language Models are Few-Shot Learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott ...

  19. [19]

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob...

  20. [20]

    Visibility into AI Agents

    Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, NitarshanRajkumar, DavidKrueger, NoamKolt, LennartHeim, andMarkusAnderljung. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, pages 958–973, New York, NY, USA, June 2024. Association fo...

  21. [21]

    Rashid, and Kuldip K

    Mayank Chaturvedi, Mahmood A. Rashid, and Kuldip K. Paliwal. Transformers in RNA structure prediction: A review.Computational and Structural Biotechnology Journal, 27:1187–1203, January 2025

  22. [22]

    RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.ACM Comput

    Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, and Bruno Castro da Silva. RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.ACM Comput. Surv., June

  23. [23]

    MAML MOT: Multiple Object Tracking Based on Meta-Learning

    Jiayi Chen and Chunhua Deng. MAML MOT: Multiple Object Tracking Based on Meta-Learning. In2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 4542– 4547, October 2024

  24. [24]

    Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput

    Shijie Chen, Yu Zhang, and Qiang Yang. Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput. Surv., 56(12):295:1–295:32, July 2024

  25. [25]

    Meta-LSTR: Meta-Learning with Long Short-Term Transformer for futures volatility prediction.Expert Systems with Applications, 265:125926, March 2025

    Yunzhu Chen, Neng Ye, Wenyu Zhang, Jiaqi Fan, Shahid Mumtaz, and Xiangming Li. Meta-LSTR: Meta-Learning with Long Short-Term Transformer for futures volatility prediction.Expert Systems with Applications, 265:125926, March 2025

  26. [26]

    AI-GAs: AI-generatingalgorithms, analternateparadigmforproducinggeneralartificial intelligence, January 2020

    JeffClune. AI-GAs: AI-generatingalgorithms, analternateparadigmforproducinggeneralartificial intelligence, January 2020. arXiv:1905.10985 [cs]. 25

  27. [27]

    Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

    Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 2978–2988, Florence, Italy, July 2019. ...

  28. [28]

    The Development of a Learning Strategies Curriculum1

    DONALD Dansereau. The Development of a Learning Strategies Curriculum1. In HAROLD F. O’neil, editor,Learning Strategies, pages 1–29. Academic Press, January 1978

  29. [29]

    Multi-task policy search for robotics

    Marc Peter Deisenroth, Peter Englert, Jan Peters, and Dieter Fox. Multi-task policy search for robotics. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3876–3881, May 2014. ISSN: 1050-4729

  30. [30]

    Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021

    Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021

  31. [31]

    Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024

    Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, and Jan Peters. Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024. arXiv:2401.09561 [cs]

  32. [32]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

  33. [33]

    Offline Meta Reinforcement Learning – Identifi- ability Challenges and Effective Data Collection Strategies

    Ron Dorfman, Idan Shenfeld, and Aviv Tamar. Offline Meta Reinforcement Learning – Identifi- ability Challenges and Effective Data Collection Strategies. InAdvances in Neural Information Processing Systems, volume 34, pages 4607–4618. Curran Associates, Inc., 2021

  34. [34]

    RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

    Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning, November 2016. arXiv:1611.02779 [cs, stat]

  35. [35]

    Worrawat Duanyai, Weon Keun Song, Poom Konghuayrob, and Manukid Parnichkun. Event- triggered model reference adaptive control system design for SISO plants using meta-learning- based physics-informed neural networks without labeled data and transfer learning.Interna- tional Journal of Adaptive Control and Signal Processing, 38(4):1442–1456, 2024. _eprint:...

  36. [36]

    Neural Architecture Search: A Survey

    Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55):1–21, 2019

  37. [37]

    Hospedales

    Linus Ericsson, Henry Gouk, Chen Change Loy, and Timothy M. Hospedales. Self-Supervised Rep- resentation Learning: Introduction, advances, and challenges.IEEE Signal Processing Magazine, 39(3):42–62, May 2022

  38. [38]

    Novoa, Justin Ko, Susan M

    Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, February 2017. Publisher: Nature Publishing Group

  39. [39]

    Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, and Alexander J. Smola. Meta-Q-Learning, April

  40. [40]

    arXiv:1910.00125 [cs, stat]

  41. [41]

    On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms

    Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 1082–1092. PMLR, June 2020. ISSN: 2640-3498

  42. [42]

    Short-term Load Forecasting of Distribution Transformer Supply Zones Based on Federated Model-Agnostic Meta Learning.IEEE Transactions on Power Systems, 40(1):31–45, January 2025

    Changsen Feng, Liang Shao, Jiaying Wang, Youbing Zhang, and Fushuan Wen. Short-term Load Forecasting of Distribution Transformer Supply Zones Based on Federated Model-Agnostic Meta Learning.IEEE Transactions on Power Systems, 40(1):31–45, January 2025. 26

  43. [43]

    Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1126–1135. PMLR, August 2017

  44. [44]

    Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

    Chelsea Finn and Sergey Levine. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm, February 2018. arXiv:1710.11622 [cs]

  45. [45]

    Probabilistic Model-Agnostic Meta-Learning

    Chelsea Finn, Kelvin Xu, and Sergey Levine. Probabilistic Model-Agnostic Meta-Learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  46. [46]

    One-Shot Visual Imitation Learning via Meta-Learning

    Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-Shot Visual Imitation Learning via Meta-Learning. InProceedings of the 1st Annual Conference on Robot Learning, pages 357–368. PMLR, October 2017. ISSN: 2640-3498

  47. [47]

    Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, April 2025

    Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, and Mac Schwager. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, April 2025. Pu...

  48. [48]

    DiCE: The Infinitely Differentiable Monte Carlo Estimator

    Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktaeschel, Eric Xing, and Shimon Whiteson. DiCE: The Infinitely Differentiable Monte Carlo Estimator. InProceedings of the 35th International Conference on Machine Learning, pages 1529–1538. PMLR, July 2018. ISSN: 2640- 3498

  49. [49]

    An Introduction to Deep Reinforcement Learning

    Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, and Joelle Pineau. An Introduction to Deep Reinforcement Learning.Foundations and Trends in Machine Learning, 11(3-4):219–354, 2018. arXiv:1811.12560 [cs, stat]

  50. [50]

    MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

    Yuan Gao, Haoping Bai, Zequn Jie, Jiayi Ma, Kui Jia, and Wei Liu. MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning. pages 11543–11552, 2020

  51. [51]

    Adaptive guidance and integrated navigation with reinforcement meta-learning.Acta Astronautica, 169:180–190, April 2020

    Brian Gaudet, Richard Linares, and Roberto Furfaro. Adaptive guidance and integrated navigation with reinforcement meta-learning.Acta Astronautica, 169:180–190, April 2020

  52. [52]

    Hassan Gharoun, Fereshteh Momenifar, Fang Chen, and Amir H. Gandomi. Meta-learning Ap- proaches for Few-Shot Learning: A Survey of Recent Advances.ACM Comput. Surv., 56(12):294:1– 294:41, July 2024

  53. [53]

    Bayesian Reinforcement Learning: A Survey

    Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey.ArXiv, abs/1609.04436, 2015

  54. [54]

    Deep sparse rectifier neural networks

    Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors,Proceedings of the Fourteenth Inter- national Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2...

  55. [55]

    Maybank, and Dacheng Tao

    Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. Knowledge Distillation: A Survey.International Journal of Computer Vision, 129(6):1789–1819, June 2021

  56. [56]

    Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

    ErinGrant, ChelseaFinn, SergeyLevine, TrevorDarrell, andThomasGriffiths. RecastingGradient- Based Meta-Learning as Hierarchical Bayes, January 2018. arXiv:1801.08930 [cs]

  57. [57]

    AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

    Jake Grigsby, Linxi Fan, and Yuke Zhu. AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents. October 2023

  58. [58]

    A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9052–9071, December 2024

    Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9052–9071, December 2024. 27

  59. [59]

    Meta- Reinforcement Learning of Structured Exploration Strategies

    Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine. Meta- Reinforcement Learning of Structured Exploration Strategies. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  60. [60]

    Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine

    Abhishek Gupta, Justin Yu, Tony Z. Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine. Reset-Free Reinforcement Learning via Multi-Task Learning: Learn- ing Dexterous Manipulation Behaviors without Human Intervention. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6664–6671, May 2021. ISSN: 2577-087X

  61. [61]

    Extending the Capabilities of Re- inforcement Learning Through Curriculum: A Review of Methods and Applications.SN Computer Science, 3(1):28, October 2021

    Kashish Gupta, Debasmita Mukherjee, and Homayoun Najjaran. Extending the Capabilities of Re- inforcement Learning Through Curriculum: A Review of Methods and Applications.SN Computer Science, 3(1):28, October 2021

  62. [62]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–1...

  63. [63]

    Benchmarking the Spectrum of Agent Capabilities

    Danijar Hafner. Benchmarking the Spectrum of Agent Capabilities. December 2021

  64. [64]

    A Survey on Vision Transformer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87– 110, January 2023

    Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao. A Survey on Vision Transformer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87– 110, January 2023

  65. [65]

    Kai He, Nan Pu, Mingrui Lao, and Michael S. Lew. Few-shot and meta-learning methods for image understanding: a survey.International Journal of Multimedia Information Retrieval, 12(2):14, June 2023

  66. [66]

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.CoRR, abs/1502.01852, 2015

  67. [67]

    Muesli: Combining Improvements in Policy Optimiza- tion

    Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, and Hado Van Hasselt. Muesli: Combining Improvements in Policy Optimiza- tion. InProceedings of the 38th International Conference on Machine Learning, pages 4214–4226. PMLR, July 2021. ISSN: 2640-3498

  68. [68]

    Distilling the knowledge in a neural network, 2015

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network, 2015

  69. [69]

    Meta-Learning in Neural Networks: ASurvey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149– 5169, September 2022

    Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-Learning in Neural Networks: ASurvey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149– 5169, September 2022. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

  70. [70]

    On Transforming Reinforcement LearningWithTransformers: TheDevelopmentTrajectory.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8580–8599, December 2024

    Shengchao Hu, Li Shen, Ya Zhang, Yixin Chen, and Dacheng Tao. On Transforming Reinforcement LearningWithTransformers: TheDevelopmentTrajectory.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8580–8599, December 2024

  71. [71]

    Toward Human- Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker

    Zhongxu Hu, Yiran Zhang, Yang Xing, Yifan Zhao, Dongpu Cao, and Chen Lv. Toward Human- Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker. IEEE Vehicular Technology Magazine, 17(4):57–64, December 2022

  72. [72]

    Improving Transformer Optimiza- tion Through Better Initialization

    Xiao Shi Huang, Felipe Perez, Jimmy Ba, and Maksims Volkovs. Improving Transformer Optimiza- tion Through Better Initialization. InProceedings of the 37th International Conference on Machine Learning, pages 4475–4483. PMLR, November 2020. ISSN: 2640-3498

  73. [73]

    van Rijn, and Aske Plaat

    Mike Huisman, Jan N. van Rijn, and Aske Plaat. A survey of deep meta-learning.Artificial Intelligence Review, 54(6):4483–4541, August 2021

  74. [74]

    Un- supervised Curricula for Visual Meta-Reinforcement Learning

    Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, and Chelsea Finn. Un- supervised Curricula for Visual Meta-Reinforcement Learning. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. 28

  75. [75]

    Prioritized Level Replay

    Minqi Jiang, Edward Grefenstette, and Tim Rocktaeschel. Prioritized Level Replay. InProceedings of the 38th International Conference on Machine Learning, pages 4940–4950. PMLR, July 2021. ISSN: 2640-3498

  76. [76]

    A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control

    Lars Johannsmeier, Malkin Gerchow, and Sami Haddadin. A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control. In2019 International Conference on Robotics and Automation (ICRA), pages 5844–5850, May 2019. ISSN: 2577-087X

  77. [77]

    A Survey of Reinforcement Learning from Human Feedback, December 2023

    Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke Huellermeier. A Survey of Reinforcement Learning from Human Feedback, December 2023. ADS Bibcode: 2023arXiv231214925K

  78. [78]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017

  79. [79]

    ImageNet Classification with Deep Con- volutional Neural Networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Con- volutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012

  80. [80]

    Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer

    Hang Lai, Weinan Zhang, Xialin He, Chen Yu, Zheng Tian, Yong Yu, and Jun Wang. Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5141–5147, May 2023

Showing first 80 references.