Recognition: no theorem link
Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent
Pith reviewed 2026-05-15 20:42 UTC · model grok-4.3
The pith
A task-based formalization of meta-learning organizes the algorithms leading to DeepMind's Adaptive Agent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper provides a rigorous task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.
What carries the argument
The task-based formalization, which structures meta-learning problems around distributions of tasks to capture transferable knowledge for rapid adaptation to new tasks.
If this is right
- The formalization supplies a consistent structure for comparing different meta-learning methods.
- It isolates the conceptual milestones required to reach generalist adaptation capabilities.
- Future algorithm design can be guided by the progression identified in the chronicle.
- The approach highlights how prior task experience enables efficient handling of novel tasks.
Where Pith is reading between the lines
- The same formalization could be applied to evaluate adaptation in current large-scale models outside the surveyed lineage.
- It suggests a way to design benchmarks that test transferable knowledge across explicitly defined task distributions.
- Extending the chronicle might reveal under-explored branches in meta-RL that connect to other adaptation techniques.
Load-bearing premise
The selected landmark algorithms accurately represent the essential conceptual path to the Adaptive Agent without major omissions or selection bias.
What would settle it
Identifying a major algorithm essential to the Adaptive Agent's development that cannot be described within the task-based formalization or was omitted from the chronicle.
Figures
read the original abstract
Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating essential concepts for understanding generalist approaches.
Significance. If the formalization is precise and the chronicle representative, the survey would consolidate key ideas in meta-learning, serving as a useful reference for tracing progression toward adaptive agents and highlighting transferable knowledge acquisition across tasks.
minor comments (2)
- [Abstract] Abstract: the claim of a 'rigorous' formalization requires explicit mathematical definitions of the task-based paradigm (e.g., task distribution, adaptation objective) in §2 or §3 to permit independent verification.
- [Chronicle of algorithms] Section on algorithm selection: state explicit inclusion criteria for landmark algorithms to address potential selection bias in the path to the Adaptive Agent.
Simulated Author's Rebuttal
We thank the referee for the positive summary and recommendation of minor revision. The report highlights the value of our task-based formalization and chronicle of algorithms leading to DeepMind's Adaptive Agent. No specific major comments were provided in the report, so we have no individual points to address at this time. We are happy to incorporate any minor suggestions if they are supplied in a subsequent round.
Circularity Check
Survey paper exhibits no circularity: formalization and chronicle rest on external citations
full rationale
This is a survey paper whose central contribution is a task-based formalization of meta-learning/meta-RL together with a chronological narrative of selected prior algorithms leading to DeepMind's Adaptive Agent. No original theorems, empirical predictions, fitted parameters, or derivations are asserted that could reduce to the paper's own inputs by construction. All landmark algorithms and concepts are explicitly referenced to prior published work by other authors. The selection of landmarks is presented as consolidation rather than a proof, so no load-bearing self-citation or self-definitional loop exists. The paper is self-contained against external benchmarks via its citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Task-based formalization of learning processes is a valid and useful organizing principle for meta-learning.
Reference graph
Works this paper leans on
-
[1]
Multitask Learning.Machine Learning, 28(1)
-
[2]
The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018
Jay Alammar. The illustrated transformer.http://jalammar.github.io/illustrated- transformer/, 2018. Accessed: 2025-07-01
work page 2018
-
[3]
Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling
Ferran Alet, Martin F. Schneider, Tomas Lozano-Perez, and Leslie Pack Kaelbling. Meta-learning curiosity algorithms, March 2020. ADS Bibcode: 2020arXiv200305325A
work page 2020
-
[4]
Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S
Aqilah M. Alsaleh, Eid Albalawi, Abdulelah Algosaibi, Salman S. Albakheet, and Surbhi Bhatia Khan. Few-Shot Learning for Medical Image Segmentation Using 3D U-Net and Model-Agnostic Meta-Learning (MAML).Diagnostics, 14(12):1213, January 2024. Number: 12 Publisher: Multi- disciplinary Digital Publishing Institute
work page 2024
- [5]
-
[6]
When MAML Can Adapt Fast and How to Assist When It Cannot
Sébastien Arnold, Shariq Iqbal, and Fei Sha. When MAML Can Adapt Fast and How to Assist When It Cannot. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 244–252. PMLR, March 2021. ISSN: 2640-3498
work page 2021
-
[7]
Lobo, Pablo Garcia-Bringas, and Javier Del Ser
Marcos Barcina-Blanco, Jesus L. Lobo, Pablo Garcia-Bringas, and Javier Del Ser. Managing the unknown in machine learning: Definitions, related areas, recent advances, and prospects.Neuro- computing, 599:128073, September 2024
work page 2024
-
[8]
Hypernetworks in Meta-Reinforcement Learning
Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, and Shimon Whiteson. Hypernetworks in Meta-Reinforcement Learning. InProceedings of The 6th Conference on Robot Learning, pages 1478–1487. PMLR, March 2023. ISSN: 2640-3498
work page 2023
-
[9]
A Survey of Meta-Reinforcement Learning, January 2023
JacobBeck, RistoVuorio, EvanZheranLiu, ZhengXiong, LuisaZintgraf, ChelseaFinn, andShimon Whiteson. A Survey of Meta-Reinforcement Learning, January 2023. arXiv:2301.08028 [cs]
- [10]
-
[11]
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents.Journal of Artificial Intelligence Research, 47:253–279, June 2013
work page 2013
-
[12]
Eseoghene Ben-Iwhiwhu, Jeffery Dick, Nicholas A. Ketz, Praveen K. Pilly, and Andrea Soltoggio. Context meta-reinforcement learning via neuromodulation.Neural Networks, 152:70–79, August 2022
work page 2022
-
[13]
Jinbo Bi, Tao Xiong, Shipeng Yu, Murat Dundar, and R. Bharat Rao. An Improved Multi-task Learning Approach with Applications in Medical Diagnosis. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors,Machine Learning and Knowledge Discovery in Databases, pages 117–132, Berlin, Heidelberg, 2008. Springer
work page 2008
-
[14]
Zhenshan Bing, Yuqi Yun, Kai Huang, and Alois Knoll. Context-Based Meta-Reinforcement Learn- ing With Bayesian Nonparametric Models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6948–6965, October 2024
work page 2024
-
[15]
Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks
Benjamin Bischke, Patrick Helber, Joachim Folz, Damian Borth, and Andreas Dengel. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks. In2019 IEEE International Conference on Image Processing (ICIP), pages 1480–1484, September 2019. ISSN: 2381-8549
work page 2019
-
[16]
Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shya- mal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, St...
work page 2021
-
[17]
Semi-Supervised Few-Shot Learning with MAML
Rinu Boney and Alexander Ilin. Semi-Supervised Few-Shot Learning with MAML. January 2018
work page 2018
-
[18]
Language Models are Few-Shot Learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott ...
work page 1901
-
[19]
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, NitarshanRajkumar, DavidKrueger, NoamKolt, LennartHeim, andMarkusAnderljung. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, pages 958–973, New York, NY, USA, June 2024. Association fo...
work page 2024
-
[21]
Mayank Chaturvedi, Mahmood A. Rashid, and Kuldip K. Paliwal. Transformers in RNA structure prediction: A review.Computational and Structural Biotechnology Journal, 27:1187–1203, January 2025
work page 2025
-
[22]
Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, and Bruno Castro da Silva. RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs.ACM Comput. Surv., June
-
[23]
MAML MOT: Multiple Object Tracking Based on Meta-Learning
Jiayi Chen and Chunhua Deng. MAML MOT: Multiple Object Tracking Based on Meta-Learning. In2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 4542– 4547, October 2024
work page 2024
-
[24]
Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput
Shijie Chen, Yu Zhang, and Qiang Yang. Multi-Task Learning in Natural Language Processing: An Overview.ACM Comput. Surv., 56(12):295:1–295:32, July 2024
work page 2024
-
[25]
Yunzhu Chen, Neng Ye, Wenyu Zhang, Jiaqi Fan, Shahid Mumtaz, and Xiangming Li. Meta-LSTR: Meta-Learning with Long Short-Term Transformer for futures volatility prediction.Expert Systems with Applications, 265:125926, March 2025
work page 2025
-
[26]
JeffClune. AI-GAs: AI-generatingalgorithms, analternateparadigmforproducinggeneralartificial intelligence, January 2020. arXiv:1905.10985 [cs]. 25
-
[27]
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 2978–2988, Florence, Italy, July 2019. ...
work page 2019
-
[28]
The Development of a Learning Strategies Curriculum1
DONALD Dansereau. The Development of a Learning Strategies Curriculum1. In HAROLD F. O’neil, editor,Learning Strategies, pages 1–29. Academic Press, January 1978
work page 1978
-
[29]
Multi-task policy search for robotics
Marc Peter Deisenroth, Peter Englert, Jan Peters, and Dieter Fox. Multi-task policy search for robotics. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3876–3881, May 2014. ISSN: 1050-4729
work page 2014
-
[30]
Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. Emergent complexity and zero-shot transfer via unsupervised envi- ronment design, 2021
work page 2021
-
[31]
Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024
Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, and Jan Peters. Sharing Knowl- edge in Multi-Task Deep Reinforcement Learning, January 2024. arXiv:2401.09561 [cs]
-
[32]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...
work page 2019
-
[33]
Ron Dorfman, Idan Shenfeld, and Aviv Tamar. Offline Meta Reinforcement Learning – Identifi- ability Challenges and Effective Data Collection Strategies. InAdvances in Neural Information Processing Systems, volume 34, pages 4607–4618. Curran Associates, Inc., 2021
work page 2021
-
[34]
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning, November 2016. arXiv:1611.02779 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[35]
Worrawat Duanyai, Weon Keun Song, Poom Konghuayrob, and Manukid Parnichkun. Event- triggered model reference adaptive control system design for SISO plants using meta-learning- based physics-informed neural networks without labeled data and transfer learning.Interna- tional Journal of Adaptive Control and Signal Processing, 38(4):1442–1456, 2024. _eprint:...
-
[36]
Neural Architecture Search: A Survey
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55):1–21, 2019
work page 2019
-
[37]
Linus Ericsson, Henry Gouk, Chen Change Loy, and Timothy M. Hospedales. Self-Supervised Rep- resentation Learning: Introduction, advances, and challenges.IEEE Signal Processing Magazine, 39(3):42–62, May 2022
work page 2022
-
[38]
Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, February 2017. Publisher: Nature Publishing Group
work page 2017
-
[39]
Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, and Alexander J. Smola. Meta-Q-Learning, April
- [40]
-
[41]
On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms
Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. On the Convergence Theory of Gradient- BasedModel-AgnosticMeta-LearningAlgorithms. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 1082–1092. PMLR, June 2020. ISSN: 2640-3498
work page 2020
-
[42]
Changsen Feng, Liang Shao, Jiaying Wang, Youbing Zhang, and Fushuan Wen. Short-term Load Forecasting of Distribution Transformer Supply Zones Based on Federated Model-Agnostic Meta Learning.IEEE Transactions on Power Systems, 40(1):31–45, January 2025. 26
work page 2025
-
[43]
Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adap- tation of Deep Networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1126–1135. PMLR, August 2017
work page 2017
-
[44]
Chelsea Finn and Sergey Levine. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm, February 2018. arXiv:1710.11622 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[45]
Probabilistic Model-Agnostic Meta-Learning
Chelsea Finn, Kelvin Xu, and Sergey Levine. Probabilistic Model-Agnostic Meta-Learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
work page 2018
-
[46]
One-Shot Visual Imitation Learning via Meta-Learning
Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-Shot Visual Imitation Learning via Meta-Learning. InProceedings of the 1st Annual Conference on Robot Learning, pages 357–368. PMLR, October 2017. ISSN: 2640-3498
work page 2017
-
[47]
Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, and Mac Schwager. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, April 2025. Pu...
work page 2025
-
[48]
DiCE: The Infinitely Differentiable Monte Carlo Estimator
Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktaeschel, Eric Xing, and Shimon Whiteson. DiCE: The Infinitely Differentiable Monte Carlo Estimator. InProceedings of the 35th International Conference on Machine Learning, pages 1529–1538. PMLR, July 2018. ISSN: 2640- 3498
work page 2018
-
[49]
An Introduction to Deep Reinforcement Learning
Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, and Joelle Pineau. An Introduction to Deep Reinforcement Learning.Foundations and Trends in Machine Learning, 11(3-4):219–354, 2018. arXiv:1811.12560 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[50]
MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning
Yuan Gao, Haoping Bai, Zequn Jie, Jiayi Ma, Kui Jia, and Wei Liu. MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning. pages 11543–11552, 2020
work page 2020
-
[51]
Brian Gaudet, Richard Linares, and Roberto Furfaro. Adaptive guidance and integrated navigation with reinforcement meta-learning.Acta Astronautica, 169:180–190, April 2020
work page 2020
-
[52]
Hassan Gharoun, Fereshteh Momenifar, Fang Chen, and Amir H. Gandomi. Meta-learning Ap- proaches for Few-Shot Learning: A Survey of Recent Advances.ACM Comput. Surv., 56(12):294:1– 294:41, July 2024
work page 2024
-
[53]
Bayesian Reinforcement Learning: A Survey
Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey.ArXiv, abs/1609.04436, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[54]
Deep sparse rectifier neural networks
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors,Proceedings of the Fourteenth Inter- national Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2...
work page 2011
-
[55]
Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. Knowledge Distillation: A Survey.International Journal of Computer Vision, 129(6):1789–1819, June 2021
work page 2021
-
[56]
Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
ErinGrant, ChelseaFinn, SergeyLevine, TrevorDarrell, andThomasGriffiths. RecastingGradient- Based Meta-Learning as Hierarchical Bayes, January 2018. arXiv:1801.08930 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[57]
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby, Linxi Fan, and Yuke Zhu. AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents. October 2023
work page 2023
-
[58]
Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9052–9071, December 2024. 27
work page 2024
-
[59]
Meta- Reinforcement Learning of Structured Exploration Strategies
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine. Meta- Reinforcement Learning of Structured Exploration Strategies. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
work page 2018
-
[60]
Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine
Abhishek Gupta, Justin Yu, Tony Z. Zhao, Vikash Kumar, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, and Sergey Levine. Reset-Free Reinforcement Learning via Multi-Task Learning: Learn- ing Dexterous Manipulation Behaviors without Human Intervention. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6664–6671, May 2021. ISSN: 2577-087X
work page 2021
-
[61]
Kashish Gupta, Debasmita Mukherjee, and Homayoun Najjaran. Extending the Capabilities of Re- inforcement Learning Through Curriculum: A Review of Methods and Applications.SN Computer Science, 3(1):28, October 2021
work page 2021
-
[62]
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–1...
work page 2018
-
[63]
Benchmarking the Spectrum of Agent Capabilities
Danijar Hafner. Benchmarking the Spectrum of Agent Capabilities. December 2021
work page 2021
-
[64]
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao. A Survey on Vision Transformer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87– 110, January 2023
work page 2023
-
[65]
Kai He, Nan Pu, Mingrui Lao, and Michael S. Lew. Few-shot and meta-learning methods for image understanding: a survey.International Journal of Multimedia Information Retrieval, 12(2):14, June 2023
work page 2023
-
[66]
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.CoRR, abs/1502.01852, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[67]
Muesli: Combining Improvements in Policy Optimiza- tion
Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, and Hado Van Hasselt. Muesli: Combining Improvements in Policy Optimiza- tion. InProceedings of the 38th International Conference on Machine Learning, pages 4214–4226. PMLR, July 2021. ISSN: 2640-3498
work page 2021
-
[68]
Distilling the knowledge in a neural network, 2015
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network, 2015
work page 2015
-
[69]
Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-Learning in Neural Networks: ASurvey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149– 5169, September 2022. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
work page 2022
-
[70]
Shengchao Hu, Li Shen, Ya Zhang, Yixin Chen, and Dacheng Tao. On Transforming Reinforcement LearningWithTransformers: TheDevelopmentTrajectory.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8580–8599, December 2024
work page 2024
-
[71]
Zhongxu Hu, Yiran Zhang, Yang Xing, Yifan Zhao, Dongpu Cao, and Chen Lv. Toward Human- Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker. IEEE Vehicular Technology Magazine, 17(4):57–64, December 2022
work page 2022
-
[72]
Improving Transformer Optimiza- tion Through Better Initialization
Xiao Shi Huang, Felipe Perez, Jimmy Ba, and Maksims Volkovs. Improving Transformer Optimiza- tion Through Better Initialization. InProceedings of the 37th International Conference on Machine Learning, pages 4475–4483. PMLR, November 2020. ISSN: 2640-3498
work page 2020
-
[73]
Mike Huisman, Jan N. van Rijn, and Aske Plaat. A survey of deep meta-learning.Artificial Intelligence Review, 54(6):4483–4541, August 2021
work page 2021
-
[74]
Un- supervised Curricula for Visual Meta-Reinforcement Learning
Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, and Chelsea Finn. Un- supervised Curricula for Visual Meta-Reinforcement Learning. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. 28
work page 2019
-
[75]
Minqi Jiang, Edward Grefenstette, and Tim Rocktaeschel. Prioritized Level Replay. InProceedings of the 38th International Conference on Machine Learning, pages 4940–4950. PMLR, July 2021. ISSN: 2640-3498
work page 2021
-
[76]
A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control
Lars Johannsmeier, Malkin Gerchow, and Sami Haddadin. A Framework for Robot Manipula- tion: Skill Formalism, Meta Learning and Adaptive Control. In2019 International Conference on Robotics and Automation (ICRA), pages 5844–5850, May 2019. ISSN: 2577-087X
work page 2019
-
[77]
A Survey of Reinforcement Learning from Human Feedback, December 2023
Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke Huellermeier. A Survey of Reinforcement Learning from Human Feedback, December 2023. ADS Bibcode: 2023arXiv231214925K
work page 2023
-
[78]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017
work page 2017
-
[79]
ImageNet Classification with Deep Con- volutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Con- volutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012
work page 2012
-
[80]
Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer
Hang Lai, Weinan Zhang, Xialin He, Chen Yu, Zheng Tian, Yong Yu, and Jun Wang. Sim-to- Real Transfer for Quadrupedal Locomotion via Terrain Transformer. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5141–5147, May 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.