Recognition: 2 theorem links
· Lean TheoremLIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Pith reviewed 2026-05-12 20:59 UTC · model grok-4.3
The pith
LIBERO benchmark shows sequential finetuning transfers knowledge better than specialized lifelong methods for robot manipulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LIBERO establishes a benchmark for lifelong decision-making in robot manipulation by providing an extendible procedural generation pipeline and four task suites that isolate declarative, procedural, and mixed knowledge transfer. The benchmark supports investigation of policy architectures, lifelong algorithms, ordering effects, and pretraining through standardized tasks with demonstration data. Experiments across these dimensions show sequential finetuning outperforming prior lifelong methods on forward transfer, variation in visual encoder effectiveness by transfer type, and negative impacts from naive supervised pretraining on subsequent lifelong performance.
What carries the argument
The LIBERO benchmark, consisting of procedurally generated task suites that separate declarative, procedural, and mixed knowledge transfer in robot manipulation policies.
If this is right
- Sequential finetuning provides a strong baseline for forward knowledge transfer in lifelong robot learning.
- Policy visual encoders require type-specific evaluation because effectiveness differs across declarative and procedural transfer.
- Pretraining approaches need refinement since naive supervised pretraining can impair agents during later lifelong learning stages.
- Task sequence influences lifelong learner robustness, requiring algorithms that handle ordering variations.
- Provided demonstration data enables sample-efficient testing of transfer methods across the benchmark tasks.
Where Pith is reading between the lines
- The focus on procedural knowledge indicates that lifelong learning techniques developed for non-action domains require targeted modifications when applied to robotics.
- Applying the benchmark to physical robot hardware could test whether simulation results on transfer and pretraining generalize to real environments.
- Hybrid methods that blend sequential finetuning with selective retention of procedural skills may address the observed limitations of existing approaches.
- The generation pipeline supports scaling to larger task sets for more rigorous evaluation of knowledge transfer limits.
Load-bearing premise
The four task suites and procedural pipeline sufficiently represent the main challenges of declarative, procedural, and mixed knowledge transfer in lifelong robot decision-making.
What would settle it
If the same comparisons of finetuning, lifelong algorithms, encoders, and pretraining on a new collection of manipulation tasks yield results where specialized lifelong methods outperform sequential finetuning in forward transfer, the reported discoveries would be refuted.
read the original abstract
Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and concepts, lifelong learning in decision-making (LLDM) also necessitates the transfer of procedural knowledge, such as actions and behaviors. To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. Specifically, LIBERO highlights five key research topics in LLDM: 1) how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both; 2) how to design effective policy architectures and 3) effective algorithms for LLDM; 4) the robustness of a lifelong learner with respect to task ordering; and 5) the effect of model pretraining for LLDM. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks. For benchmarking purpose, we create four task suites (130 tasks in total) that we use to investigate the above-mentioned research topics. To support sample-efficient learning, we provide high-quality human-teleoperated demonstration data for all tasks. Our extensive experiments present several insightful or even unexpected discoveries: sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels at all types of knowledge transfer, and naive supervised pretraining can hinder agents' performance in the subsequent LLDM. Check the website at https://libero-project.github.io for the code and the datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LIBERO, a benchmark for lifelong learning in robot manipulation (LLDM) featuring an extendible procedural generation pipeline and four fixed task suites (130 tasks total) with human-teleoperated demonstrations. It positions the benchmark to study five LLDM topics: efficient transfer of declarative/procedural/mixed knowledge, policy architectures, algorithms, robustness to task ordering, and pretraining effects. Experiments report three main findings: sequential finetuning outperforms existing lifelong methods on forward transfer, no single visual encoder excels across all transfer types, and naive supervised pretraining can hinder subsequent LLDM performance.
Significance. If the task suites validly isolate the targeted knowledge-transfer distinctions, LIBERO would be a useful standardized resource for reproducible study of lifelong robot learning, filling a gap between declarative transfer benchmarks in vision/language and the procedural demands of decision-making. The provision of high-quality demos, procedural extensibility, and public code/datasets supports reproducibility and could accelerate work on the five listed topics.
major comments (3)
- [§3.2–3.3] §3.2–3.3 (Task Suites and Procedural Pipeline): The claim that the four suites differentially stress declarative knowledge (entities/concepts), procedural knowledge (actions/behaviors), or mixtures is asserted without quantitative support such as a per-task breakdown of required skills, horizon lengths, or visual similarity metrics, nor any sensitivity analysis showing that observed performance gaps arise from these distinctions rather than incidental factors.
- [§4] §4 (Experiments): The three reported discoveries (sequential finetuning superiority, visual-encoder non-universality, pretraining hindrance) are presented without error bars, statistical significance tests, full hyperparameter protocols, or baseline implementation details, making it impossible to assess whether the results are robust or sensitive to random seeds and implementation choices.
- [§4.3] §4.3 (Pretraining and Transfer): The finding that naive supervised pretraining can hinder LLDM performance is load-bearing for topic 5 yet lacks controls for pretraining dataset size, domain gap, or fine-tuning schedule, leaving open whether the hindrance is general or specific to the chosen pretraining regime and task ordering.
minor comments (3)
- [Figures/Tables] Figure 3 and Table 2: Axis labels and legend entries are too small for readability; consider increasing font size and adding explicit task-suite identifiers.
- [§2] §2 (Related Work): A few recent lifelong robot learning benchmarks (e.g., those using simulation suites like RLBench or Meta-World) are cited but not compared on metrics such as task diversity or demonstration quality.
- [Throughout] Notation: The distinction between “declarative” and “procedural” knowledge is used throughout but never given an operational definition tied to the task suites; a short clarifying paragraph would help.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. The comments have helped us identify areas where additional quantitative support and experimental rigor will strengthen the presentation of LIBERO. We address each major comment below and indicate the revisions made.
read point-by-point responses
-
Referee: [§3.2–3.3] §3.2–3.3 (Task Suites and Procedural Pipeline): The claim that the four suites differentially stress declarative knowledge (entities/concepts), procedural knowledge (actions/behaviors), or mixtures is asserted without quantitative support such as a per-task breakdown of required skills, horizon lengths, or visual similarity metrics, nor any sensitivity analysis showing that observed performance gaps arise from these distinctions rather than incidental factors.
Authors: We agree that the original manuscript would benefit from explicit quantitative validation of the intended knowledge distinctions. In the revised version, we have added a new table in §3.2 that provides a per-task breakdown of required skills (e.g., object manipulation vs. spatial reasoning), average horizon lengths across suites, and visual similarity metrics computed as average cosine distances between task image features extracted from a frozen CLIP encoder. We have also included a sensitivity analysis in §3.3: by systematically ablating declarative elements (e.g., object identity changes) or procedural elements (e.g., action sequence modifications) in selected tasks and measuring the resulting transfer performance gaps, we show that the observed differences align with the targeted knowledge types rather than incidental factors. These additions directly support the differential stress claims. revision: yes
-
Referee: [§4] §4 (Experiments): The three reported discoveries (sequential finetuning superiority, visual-encoder non-universality, pretraining hindrance) are presented without error bars, statistical significance tests, full hyperparameter protocols, or baseline implementation details, making it impossible to assess whether the results are robust or sensitive to random seeds and implementation choices.
Authors: We acknowledge that the lack of statistical reporting and implementation details limits reproducibility assessment. We have revised §4 and the appendix to include error bars computed from five independent runs with distinct random seeds for all reported metrics. Paired t-tests with p-values are now provided for the key comparisons underlying the three discoveries. The appendix has been expanded with complete hyperparameter tables for every method and baseline (including learning rates, batch sizes, and network dimensions), along with explicit references to the public code repository where the exact implementations can be inspected. revision: yes
-
Referee: [§4.3] §4.3 (Pretraining and Transfer): The finding that naive supervised pretraining can hinder LLDM performance is load-bearing for topic 5 yet lacks controls for pretraining dataset size, domain gap, or fine-tuning schedule, leaving open whether the hindrance is general or specific to the chosen pretraining regime and task ordering.
Authors: The original experiments used a standard ImageNet-supervised pretrained visual encoder as the naive baseline. To address the referee's concern, we have added controlled experiments in the revised §4.3: (1) varying pretraining dataset size via random subsets of ImageNet, (2) reducing domain gap by comparing against a model pretrained on a large robot manipulation dataset, and (3) testing multiple fine-tuning schedules (different learning-rate decays and epoch counts). The hindrance effect remains consistent across these controls, supporting that it is not an artifact of the specific regime or ordering. We have updated the text to clarify the scope of the claim while noting that exhaustive exploration of all pretraining variants lies beyond a single benchmark paper. revision: partial
Circularity Check
No circularity: empirical benchmark with no derivations or self-referential predictions
full rationale
The paper introduces LIBERO as an empirical benchmark for lifelong decision-making in robot manipulation, supported by procedural task generation, four task suites, and human demonstration data. It reports experimental findings on transfer, architectures, algorithms, ordering, and pretraining without any equations, fitted parameters renamed as predictions, or derivation chains. No self-citations are load-bearing for core claims, and the work is self-contained against external benchmarks via provided code and datasets. The primary assumption (suite representativeness) is an empirical design choice open to external validation rather than a circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The four task suites and procedural generation pipeline capture the essential challenges of declarative, procedural, and mixed knowledge transfer in robot lifelong learning.
Forward citations
Cited by 33 Pith papers
-
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
-
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
ALAM creates algebraically consistent latent action transitions from videos to act as auxiliary generative targets, raising robot policy success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
-
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...
-
Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models
Privileged Foresight Distillation distills the residual difference in action predictions with versus without future context into a current-only adapter, yielding consistent gains on LIBERO and RoboTwin benchmarks.
-
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
-
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
-
Mask World Model: Predicting What Matters for Robust Robot Policy Learning
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization...
-
STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations
STRONG-VLA uses decoupled two-stage training to improve VLA model robustness, yielding up to 16% higher task success rates under seen and unseen perturbations on the LIBERO benchmark.
-
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning
TMRL bridges behavioral cloning pretraining and RL finetuning via diffusion noise and timestep modulation to enable controlled exploration, improving sample efficiency and enabling real-world robot training in under one hour.
-
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
Pace-and-Path Correction is a closed-form inference-time operator that decomposes a quadratic cost minimization into orthogonal pace compression and path offset channels to correct dynamics-blindness in chunked-action...
-
BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation
BEACON uses discrepancy-aware importance reweighting to jointly train diffusion-based robot policies and source sample weights, improving performance over target-only and fixed-ratio baselines in cross-domain manipula...
-
BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation
BEACON uses discrepancy-aware importance reweighting to co-train generative robot policies from abundant source and limited target demonstrations, yielding better robustness and implicit feature alignment.
-
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
Reducing visual input to one token per frame in world models for vision-language-action policies maintains long-horizon performance while improving success rates on MetaWorld, LIBERO, and real-robot tasks.
-
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
Reducing visual input to one token per frame via adaptive attention pooling and a unified flow-matching objective improves long-horizon performance in VLA policies on MetaWorld, LIBERO, and real-robot tasks.
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation
ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
-
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
PRTS pretrains VLA models with contrastive goal-conditioned RL to embed goal-reachability probabilities from offline data, yielding SOTA results on robotic benchmarks especially for long-horizon and novel instructions.
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.
-
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors
CorridorVLA improves VLA models by using predicted sparse anchors to impose explicit spatial corridors on action trajectories, yielding 3.4-12.4% success rate gains on LIBERO-Plus with GR00T-Corr reaching 83.21%.
-
Grounded World Model for Semantically Generalizable Planning
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
Fast-WAM: Do World Action Models Need Test-time Future Imagination?
Fast-WAM shows that explicit future imagination at test time is not required for strong WAM performance; video modeling during training provides the main benefit.
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.
-
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation
ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.
-
Understanding Asynchronous Inference Methods for Vision-Language-Action Models
Controlled benchmarks show per-step residual correction (A2C2) as most effective for VLA asynchronous inference up to d=8 delays on Kinetix with over 90% solve rate, outperforming inpainting and conditioning while tra...
-
Gated Memory Policy
GMP selectively activates and represents memory via a gate and lightweight cross-attention, yielding 30.1% higher success on non-Markovian robotic tasks while staying competitive on Markovian ones.
-
From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data
A survey introduces an interface-centric taxonomy for video-to-control methods in robotic manipulation and identifies the robotics integration layer as the central open challenge.
-
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
SpatialVLA adds 3D-aware position encoding and adaptive discretized action grids to visual-language-action models, enabling strong zero-shot performance and fine-tuning on new robot setups after pre-training on 1.1 mi...
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
-
Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data...
Reference graph
Works this paper leans on
- [1]
-
[2]
Few-shot continual active learning by a robot
Ali Ayub and Carter Fendley. Few-shot continual active learning by a robot. arXiv preprint arXiv:2210.04137, 2022
-
[3]
F-siol-310: A robotic dataset and benchmark for few-shot incre- mental object learning
Ali Ayub and Alan R Wagner. F-siol-310: A robotic dataset and benchmark for few-shot incre- mental object learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13496–13502. IEEE, 2021. 10
work page 2021
-
[4]
A framework for behavioural cloning
Michael Bain and Claude Sammut. A framework for behavioural cloning. In Machine Intelli- gence 15, pages 103–129, 1995
work page 1995
-
[5]
Lifelong reinforcement learning with modulating masks
Eseoghene Ben-Iwhiwhu, Saptarshi Nath, Praveen K Pilly, Soheil Kolouri, and Andrea Soltog- gio. Lifelong reinforcement learning with modulating masks. arXiv preprint arXiv:2212.11110, 2022
-
[6]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009
work page 2009
-
[7]
Continual lifelong learning in natural language processing: A survey
Magdalena Biesialska, Katarzyna Biesialska, and Marta R Costa-Jussa. Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823, 2020
- [8]
-
[9]
Dark experience for general continual learning: a strong, simple baseline
Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020
work page 2020
-
[10]
Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997
work page 1997
-
[11]
Riemannian walk for incremental learning: Understanding forgetting and intransigence
Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018
work page 2018
-
[12]
Efficient lifelong learning with A-GEM.CoRR, abs/1812.00420, 2018
Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018
-
[13]
Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019
-
[14]
Su- perposition of many models into one
Brian Cheung, Alexander Terekhov, Yubei Chen, Pulkit Agrawal, and Bruno Olshausen. Su- perposition of many models into one. Advances in neural information processing systems, 32, 2019
work page 2019
-
[15]
Leveraging procedural generation to benchmark reinforcement learning
Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR, 2020
work page 2048
-
[16]
A continual learning survey: Defying forgetting in classification tasks
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366– 3385, 2021
work page 2021
-
[17]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[18]
The mnist database of handwritten digit images for machine learning research
Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012
work page 2012
-
[19]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Natalia Díaz-Rodríguez, Vincenzo Lomonaco, David Filliat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning. arXiv preprint arXiv:1810.13166, 2018
-
[21]
Memory efficient continual learning with transformers
Beyza Ermis, Giovanni Zappella, Martin Wistuba, and Cédric Archambeau. Memory efficient continual learning with transformers. 2022. 11
work page 2022
-
[22]
Ego4d: Around the world in 3,000 hours of egocentric video
Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022
work page 2022
-
[23]
Visualizing and understanding atari agents
Sam Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. Visualizing and understanding atari agents. ArXiv, abs/1711.00138, 2017
-
[24]
(2023) Maniskill2: A unified benchmark for generalizable manipulation skills
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, et al. Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023
-
[25]
Deep recurrent q-learning for partially observable mdps
Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series, 2015
work page 2015
-
[26]
Compacting, picking and growing for unforgetting continual learning
Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu- Song Chen. Compacting, picking and growing for unforgetting continual learning. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[27]
Rlbench: The robot learning benchmark & learning environment
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020
work page 2020
-
[28]
The foundation of efficient robot learning
Leslie Pack Kaelbling. The foundation of efficient robot learning. Science, 369(6506):915–916, 2020
work page 2020
-
[29]
Class-incremental learning by knowledge distillation with adaptive feature consolidation
Minsoo Kang, Jaeyoo Park, and Bohyung Han. Class-incremental learning by knowledge distillation with adaptive feature consolidation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16071–16080, 2022
work page 2022
-
[30]
Vizdoom: A doom-based ai research platform for visual reinforcement learning
Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Ja´skowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG), pages 1–8. IEEE, 2016
work page 2016
-
[31]
Vilt: Vision-and-language transformer without convolution or region supervision
Wonjae Kim, Bokyung Son, and Ildoo Kim. Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning, pages 5583–5594. PMLR, 2021
work page 2021
-
[32]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[33]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017
work page 2017
-
[34]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[35]
Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation
Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín- Martín, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, et al. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning, pages 80–93. PMLR, 2023
work page 2023
-
[36]
Liu, Qian Liu, and Peter Stone
B. Liu, Qian Liu, and Peter Stone. Continual learning and private unlearning. In CoLLAs, 2022
work page 2022
-
[37]
Continual learning with recursive gradient optimization
Hao Liu and Huaping Liu. Continual learning with recursive gradient optimization. arXiv preprint arXiv:2201.12522, 2022
-
[38]
Core50: a new dataset and benchmark for continuous object recognition
Vincenzo Lomonaco and Davide Maltoni. Core50: a new dataset and benchmark for continuous object recognition. In Conference on Robot Learning, pages 17–26. PMLR, 2017
work page 2017
-
[39]
Gradient episodic memory for continual learning
David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017. 12
work page 2017
-
[40]
Online continual learning in image classification: An empirical survey
Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51, 2022
work page 2022
-
[41]
Packnet: Adding multiple tasks to a single network by iterative pruning
Arun Mallya and Svetlana Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7765–7773, 2018
work page 2018
-
[42]
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021
work page internal anchor Pith review arXiv 2021
-
[43]
Pddl-the planning domain definition language
Drew McDermott, Malik Ghallab, Adele Howe, Craig Knoblock, Ashwin Ram, Manuela Veloso, Daniel Weld, and David Wilkins. Pddl-the planning domain definition language. 1998
work page 1998
-
[44]
Composuite: A compo- sitional reinforcement learning benchmark
Jorge A Mendez, Marcel Hussing, Meghna Gummadi, and Eric Eaton. Composuite: A compo- sitional reinforcement learning benchmark. arXiv preprint arXiv:2207.04136, 2022
-
[45]
Architecture matters in continual learning
Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin, Timothy Nguyen, Razvan Pascanu, Dilan Gorur, and Mehrdad Farajtabar. Architecture matters in continual learning. arXiv preprint arXiv:2202.00275, 2022
-
[46]
Playing Atari with Deep Reinforcement Learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[47]
ManiSkill: Generalizable manipulation skill benchmark with large-scale demonstrations
Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, and Hao Su. Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. arXiv preprint arXiv:2107.14483, 2021
-
[48]
Curriculum learning for reinforcement learning domains: A framework and survey
Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv preprint arXiv:2003.04960, 2020
-
[49]
Continual lifelong learning with neural networks: A review
German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019
work page 2019
-
[50]
Film: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018
work page 2018
-
[51]
Cora: Benchmarks, baselines, and metrics as a platform for continual reinforcement learning agents
Sam Powers, Eliot Xing, Eric Kolve, Roozbeh Mottaghi, and Abhinav Gupta. Cora: Benchmarks, baselines, and metrics as a platform for continual reinforcement learning agents. arXiv preprint arXiv:2110.10067, 2021
-
[52]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021
work page 2021
-
[53]
Language models are unsupervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019
work page 2019
-
[54]
Lifelong learning without a task oracle
Amanda Rios and Laurent Itti. Lifelong learning without a task oracle. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pages 255–263. IEEE, 2020
work page 2020
-
[55]
A reduction of imitation learning and structured prediction to no-regret online learning
Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth interna- tional conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011. 13
work page 2011
-
[56]
Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[57]
Space: Structured compression and sharing of representational space for continual learning
Gobinda Saha, Isha Garg, Aayush Ankit, and Kaushik Roy. Space: Structured compression and sharing of representational space for continual learning. IEEE Access, 9:150480–150494, 2021
work page 2021
-
[58]
Minihack the planet: A sandbox for open-ended reinforcement learning research
Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, and Tim Rocktäschel. Minihack the planet: A sandbox for open-ended reinforcement learning research. arXiv preprint arXiv:2109.13202, 2021
-
[59]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020
work page 2020
-
[60]
Progress & compress: A scalable framework for continual learning
Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning, pages 4528–4537. PMLR, 2018
work page 2018
-
[61]
Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning
Qi She, Fan Feng, Xinyue Hao, Qihan Yang, Chuanlin Lan, Vincenzo Lomonaco, Xuesong Shi, Zhengwei Wang, Yao Guo, Yimin Zhang, et al. Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning. In 2020 IEEE international conference on robotics and automation (ICRA), pages 4767–4773. IEEE, 2020
work page 2020
-
[62]
Alfred: A benchmark for interpreting grounded instructions for everyday tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mot- taghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749, 2020
work page 2020
-
[63]
Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent El- liott Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, Karen Liu, et al. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on Robot Learning, pages 477–490. PMLR, 2022
work page 2022
-
[64]
Open- ended learning leads to generally capable agents
Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, et al. Open- ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021
-
[65]
Sebastian Thrun and Tom M Mitchell. Lifelong robot learning. Robotics and autonomous systems, 15(1-2):25–46, 1995
work page 1995
-
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017
work page 2017
-
[67]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018
work page internal anchor Pith review arXiv 2018
-
[68]
Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, and Anima Anandkumar. Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023
-
[69]
Continual world: A robotic benchmark for continual reinforcement learning
Maciej Wołczyk, Michal Zajkac, Razvan Pascanu, Lukasz Kuci’nski, and Piotr Milo’s. Continual world: A robotic benchmark for continual reinforcement learning. In Neural Information Processing Systems, 2021
work page 2021
-
[70]
Disen- tangling transfer in continual reinforcement learning
Maciej Wołczyk, Michal Zajkac, Razvan Pascanu, Lukasz Kuci’nski, and Piotr Milo’s. Disen- tangling transfer in continual reinforcement learning. ArXiv, abs/2209.13900, 2022. 14
-
[71]
Firefly neural architecture descent: a general approach for growing neural networks
Lemeng Wu, Bo Liu, Peter Stone, and Qiang Liu. Firefly neural architecture descent: a general approach for growing neural networks. Advances in Neural Information Processing Systems, 33:22373–22383, 2020
work page 2020
-
[72]
Lifelong learning with dynamically expandable networks
Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017
-
[73]
Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020
work page 2020
-
[74]
Forward compatible few-shot class-incremental learning
Da-Wei Zhou, Fu-Yun Wang, Han-Jia Ye, Liang Ma, Shiliang Pu, and De-Chuan Zhan. Forward compatible few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9046–9056, 2022
work page 2022
-
[75]
Viola: Imitation learning for vision- based manipulation with object proposal priors
Yifeng Zhu, Abhishek Joshi, Peter Stone, and Yuke Zhu. Viola: Imitation learning for vision- based manipulation with object proposal priors. arXiv preprint arXiv:2210.11339, 2022
-
[76]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, and Roberto Martín-Martín. robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020. 15 Checklist
work page internal anchor Pith review arXiv 2009
-
[77]
For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] (c) Did you discuss any potential negative societal impacts of your work? [Yes] (d) Have you read the ethics review guidelines and ensured that your paper con...
-
[78]
If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [N/A] (b) Did you include complete proofs of all theoretical results? [N/A]
-
[79]
If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] (c) Did you report error bars (e.g.,...
-
[80]
If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [N/A] (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent wa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.