Recognition: unknown
Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers
Pith reviewed 2026-05-10 15:03 UTC · model grok-4.3
The pith
Sub-optimal whole-body controllers randomized for data allow offline RL to learn real-robot mobile manipulation policies without teleoperation or finetuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WHOLE-MoMa first randomizes a lightweight whole-body controller to collect diverse demonstrations and then applies an extension of offline implicit Q-learning with Q-chunking to train action-chunked diffusion policies. These policies improve upon the sub-optimal controller, significantly outperforming WBC, behavior cloning, and other offline RL methods on three simulation tasks of increasing difficulty. The policies transfer without finetuning to a real TIAGo++ robot, reaching 80 percent success in bimanual drawer manipulation and 68 percent success in simultaneous cupboard opening with object placement, using no teleoperated or real-world training data.
What carries the argument
The two-stage WHOLE-MoMa pipeline that randomizes a sub-optimal WBC to generate constrained demonstrations and then extends offline IQL with Q-chunking for chunk-level critic evaluation and advantage-weighted extraction of action-chunked diffusion policies.
If this is right
- Outperforms the original WBC, behavior cloning, and several offline RL baselines on three simulation tasks of increasing difficulty.
- Policies transfer directly to the physical TIAGo++ robot without any finetuning or real-world data.
- Achieves 80 percent success on bimanual drawer manipulation and 68 percent success on simultaneous cupboard opening plus object placement.
Where Pith is reading between the lines
- The same randomization-plus-offline-RL pattern could apply to other robotics domains where classical controllers exist but optimal data is scarce.
- Success may depend on how well the randomization covers the state space needed for longer-horizon tasks, suggesting tests with varied randomization schedules.
- The method points toward scalable hybrids that start from existing controllers rather than learning from scratch, potentially lowering the cost of real-world deployment.
Load-bearing premise
Randomizing a sub-optimal whole-body controller produces demonstrations diverse enough and located in the right part of the state-action space for offline RL to discover and combine better behaviors.
What would settle it
Run the learned policy on the cupboard-and-placement task in simulation and check whether its success rate remains no higher than the sub-optimal WBC baseline or drops below 50 percent upon direct transfer to the real TIAGo++ robot.
Figures
read the original abstract
Mobile Manipulation (MoMa) of articulated objects, such as opening doors, drawers, and cupboards, demands simultaneous, whole-body coordination between a robot's base and arms. Classical whole-body controllers (WBCs) can solve such problems via hierarchical optimization, but require extensive hand-tuned optimization and remain brittle. Learning-based methods, on the other hand, show strong generalization capabilities but typically rely on expensive whole-body teleoperation data or heavy reward engineering. We observe that even a sub-optimal WBC is a powerful structural prior: it can be used to collect data in a constrained, task-relevant region of the state-action space, and its behavior can still be improved upon using offline reinforcement learning. Building on this, we propose WHOLE-MoMa, a two-stage pipeline that first generates diverse demonstrations by randomizing a lightweight WBC, and then applies offline RL to identify and stitch together improved behaviors via a reward signal. To support the expressive action-chunked diffusion policies needed for complex coordination tasks, we extend offline implicit Q-learning with Q-chunking for chunk-level critic evaluation and advantage-weighted policy extraction. On three tasks of increasing difficulty using a TIAGo++ mobile manipulator in simulation, WHOLE-MoMa significantly outperforms WBC, behavior cloning, and several offline RL baselines. Policies transfer directly to the real robot without finetuning, achieving 80% success in bimanual drawer manipulation and 68% in simultaneous cupboard opening and object placement, all without any teleoperated or real-world training data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes WHOLE-MoMa, a two-stage pipeline for whole-body mobile manipulation on a TIAGo++ robot: (1) randomize a lightweight sub-optimal whole-body controller (WBC) to collect task-relevant demonstrations without teleoperation or real-world data, and (2) apply an extended offline implicit Q-learning algorithm with Q-chunking to train chunked diffusion policies that stitch improved behaviors. It reports significant outperformance over WBC, behavior cloning, and other offline RL baselines on three simulation tasks of increasing difficulty, with direct sim-to-real transfer yielding 80% success on bimanual drawer manipulation and 68% on simultaneous cupboard opening plus object placement.
Significance. If the empirical claims hold, the work offers a practical bridge between classical controllers and learning-based methods by treating sub-optimal WBCs as structural priors for data generation, thereby avoiding expensive teleoperation and reward engineering while enabling real-robot deployment. The Q-chunking extension for offline RL on high-dimensional coordinated actions is a targeted technical contribution that could generalize to other whole-body tasks.
major comments (3)
- [§4 and §5] §4 (Data Generation) and §5 (Experiments): The central claim that randomizing the sub-optimal WBC produces a dataset with sufficient state-action coverage for the Q-chunked IQL critic to identify and stitch superior behaviors is load-bearing for the reported gains over WBC and BC baselines, yet no coverage metrics, state-space visualizations, or ablation on randomization parameters are provided. This is especially critical for the hardest task (simultaneous cupboard opening + placement), where the skeptic concern about narrow manifold exploration around WBC targets could explain the results as minor noise around BC rather than true stitching.
- [§5.2] §5.2 (Baselines and Implementation): The outperformance claims require that offline RL baselines (e.g., standard IQL, CQL) and behavior cloning are implemented with equivalent action chunking, network architectures, and hyperparameter tuning as the proposed method; without explicit confirmation or code release, it is unclear whether the gains are due to the Q-chunking extension or implementation differences.
- [§5.3] §5.3 (Real-robot results): The 80% and 68% success rates on real-robot transfer are reported without the number of trials, variance, or failure-mode analysis; given the low reader's confidence in experimental details, these numbers cannot yet support the 'direct transfer without finetuning' claim as robustly as needed for the central contribution.
minor comments (2)
- [§3] Notation for the Q-chunking extension (e.g., how chunk-level advantages are computed from per-timestep Q-values) should be formalized with an equation in §3 to improve reproducibility.
- [Figures in §5] Figure captions and axis labels in the experimental plots could more clearly indicate the number of seeds and whether shaded regions represent standard deviation or error.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [§4 and §5] §4 (Data Generation) and §5 (Experiments): The central claim that randomizing the sub-optimal WBC produces a dataset with sufficient state-action coverage for the Q-chunked IQL critic to identify and stitch superior behaviors is load-bearing for the reported gains over WBC and BC baselines, yet no coverage metrics, state-space visualizations, or ablation on randomization parameters are provided. This is especially critical for the hardest task (simultaneous cupboard opening + placement), where the skeptic concern about narrow manifold exploration around WBC targets could explain the results as minor noise around BC rather than true stitching.
Authors: We agree that explicit evidence of state-action coverage and behavior stitching is important to substantiate the central claims, particularly for the most challenging task. In the revised manuscript, we will add state-space visualizations (e.g., t-SNE or PCA projections comparing randomized WBC trajectories against base WBC and learned policy rollouts) along with quantitative coverage metrics such as state entropy and action diversity. We will also include an ablation on randomization parameters (e.g., varying noise scales) and, for the cupboard+placement task, provide reward histograms and qualitative trajectory comparisons demonstrating that the policy achieves higher returns and distinct coordination patterns beyond BC. These additions will directly address concerns about narrow manifold exploration. revision: yes
-
Referee: [§5.2] §5.2 (Baselines and Implementation): The outperformance claims require that offline RL baselines (e.g., standard IQL, CQL) and behavior cloning are implemented with equivalent action chunking, network architectures, and hyperparameter tuning as the proposed method; without explicit confirmation or code release, it is unclear whether the gains are due to the Q-chunking extension or implementation differences.
Authors: All baselines were implemented with identical action chunking, network architectures, and comparable hyperparameter tuning to ensure fair comparison; the performance improvements arise specifically from the Q-chunking extension to the IQL critic and advantage-weighted policy. We will add an explicit statement in §5.2 confirming these implementation equivalences and will release the full codebase upon acceptance to enable independent verification. revision: yes
-
Referee: [§5.3] §5.3 (Real-robot results): The 80% and 68% success rates on real-robot transfer are reported without the number of trials, variance, or failure-mode analysis; given the low reader's confidence in experimental details, these numbers cannot yet support the 'direct transfer without finetuning' claim as robustly as needed for the central contribution.
Authors: We acknowledge that additional statistical details are needed to robustly support the sim-to-real transfer claims. In the revision, we will report the exact number of trials conducted (50 per task), include variance measures such as standard deviations or binomial confidence intervals, and provide a categorized failure-mode analysis (e.g., grasp slippage, base-arm desynchronization, or perception errors). These details will be added to §5.3 to strengthen the evidence for direct transfer without finetuning. revision: yes
Circularity Check
No circularity: empirical pipeline with independent experimental validation
full rationale
The paper describes a two-stage empirical method: randomize a sub-optimal WBC to collect task-relevant data, then apply an extended offline RL algorithm (Q-chunked IQL) to stitch improved behaviors. Claims rest on simulation experiments across three tasks plus direct sim-to-real transfer, with explicit comparisons to WBC, BC, and offline RL baselines. No derivation step reduces to a fitted parameter renamed as prediction, no self-definitional loop, and no load-bearing self-citation that substitutes for independent evidence. The coverage assumption is tested rather than assumed by construction, so the central result does not collapse to its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mobility and manipulation,
O. Brock, J. Park, and M. Toussaint, “Mobility and manipulation,” in Springer Handbook of Robotics, 2nd Edition, 2016
2016
-
[2]
Home- robot: Open-vocabulary mobile manipulation,
S. Yenamandra, A. Ramachandran, K. Yadav, A. S. Wang, M. Khanna, T. Gervet, T.-Y . Yang, V . Jain, A. Clegg, J. M. Turner,et al., “Home- robot: Open-vocabulary mobile manipulation,” inConference on Robot Learning. PMLR, 2023, pp. 1975–2011
2023
-
[3]
Demonstrating mobile manipulation in the wild: A metrics-driven approach,
M. Bajracharya, J. Borders, R. Cheng, D. Helmick, L. Kaul, D. Kruse, J. Leichty, J. Ma, C. Matl, F. Michel,et al., “Demonstrating mobile manipulation in the wild: A metrics-driven approach,”arXiv preprint arXiv:2401.01474, 2024
-
[4]
Fully autonomous real-world reinforcement learning with applications to mobile manipulation,
C. Sun, J. Orbik, C. M. Devin, B. H. Yang, A. Gupta, G. Berseth, and S. Levine, “Fully autonomous real-world reinforcement learning with applications to mobile manipulation,” inCoRL, ser. Machine Learning Research. PMLR, 2021. 12
2021
-
[5]
Robot learning of mobile manipulation with reachability behavior priors,
S. Jauhri, J. Peters, and G. Chalvatzaki, “Robot learning of mobile manipulation with reachability behavior priors,”IEEE Robotics and Automation Letters, 2022
2022
-
[6]
Spin: Simultaneous perception, interaction and navigation,
S. Uppal, A. Agarwal, H. Xiong, K. Shaw, and D. Pathak, “Spin: Simultaneous perception, interaction and navigation,”CVPR, 2024
2024
-
[7]
Learning multi-stage pick-and-place with a legged mobile manipulator,
H. Zhang, H. Yu, L. Zhao, A. Choi, Q. Bai, Y . Yang, and W. Xu, “Learning multi-stage pick-and-place with a legged mobile manipulator,” IEEE Robotics and Automation Letters (RA-L), 2025
2025
-
[8]
Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,
Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” inCon- ference on Robot Learning (CoRL), 2024
2024
-
[9]
Learning kinematic fea- sibility for mobile manipulation through deep reinforcement learning,
D. Honerkamp, T. Welschehold, and A. Valada, “Learning kinematic fea- sibility for mobile manipulation through deep reinforcement learning,” IEEE Robotics and Automation Letters (RA-L), 2021
2021
-
[10]
H 2-compact: Human-humanoid co-manipulation via adaptive contact trajectory policies,
G. C. R. Bethala, H. Huang, N. Pudasaini, A. M. Ali, S. Yuan, C. Wen, A. Tzes, and Y . Fang, “H 2-compact: Human-humanoid co-manipulation via adaptive contact trajectory policies,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2025
2025
-
[11]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, 2025
2025
-
[12]
Learning to walk in minutes using massively parallel deep reinforcement learning,
N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on robot learning. PMLR, 2022
2022
-
[13]
Pedipulate: Enabling manipulation skills using a quadruped robot’s leg,
P. Arm, M. Mittal, H. Kolvenbach, and M. Hutter, “Pedipulate: Enabling manipulation skills using a quadruped robot’s leg,” inIEEE Conference on Robotics and Automation (ICRA 2024), 2024
2024
-
[14]
Demonstrating MOSART: Opening Articulated Structures in the Real World,
A. Gupta, M. Zhang, R. Sathua, and S. Gupta, “Demonstrating MOSART: Opening Articulated Structures in the Real World,” in Robotics: Science and Systems, LosAngeles, CA, USA, 2025
2025
-
[15]
Homer: Learn- ing in-the-wild mobile manipulation via hybrid imitation and whole- body control,
P. Sundaresan, R. Malhotra, P. Miao, J. Yang, J. Wu, H. Hu, R. Antonova, F. Engelmann, D. Sadigh, and J. Bohg, “Homer: Learning in-the-wild mobile manipulation via hybrid imitation and whole-body control,” arXiv preprint arXiv:2506.01185, 2025
-
[16]
Hommi: Learning whole-body mobile manipulation from human demonstrations,
X. Xu, J. Park, H. Zhang, E. Cousineau, A. Bhat, J. Barreiros, D. Wang, and S. Song, “Hommi: Learning whole-body mobile manipulation from human demonstrations,” 2026
2026
-
[17]
HRL4IN: hierarchical reinforcement learning for interactive navigation with mobile manipula- tors,
C. Li, F. Xia, R. Mart ´ın-Mart´ın, and S. Savarese, “HRL4IN: hierarchical reinforcement learning for interactive navigation with mobile manipula- tors,” inCoRL, ser. PMLR, 2019
2019
-
[18]
Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation,
F. Xia, C. Li, R. Mart ´ın-Mart´ın, O. Litany, A. Toshev, and S. Savarese, “Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation,” inIEEE International Conference on Robotics and Automation (ICRA), 2021
2021
-
[19]
Har- monic mobile manipulation
R. Yang, Y . Kim, A. Kembhavi, X. Wang, and K. Ehsani, “Harmonic mobile manipulation,”arXiv preprint arXiv:2312.06639, 2023
-
[20]
Synthesis of whole-body behaviors through hierarchical control of behavioral primitives,
L. Sentis and O. Khatib, “Synthesis of whole-body behaviors through hierarchical control of behavioral primitives,”International Journal of Humanoid Robotics, 2005
2005
-
[21]
Hierarchical quadratic programming: Fast online humanoid-robot motion generation,
A. Escande, N. Mansard, and P.-B. Wieber, “Hierarchical quadratic programming: Fast online humanoid-robot motion generation,”The International Journal of Robotics Research, 2014
2014
-
[22]
Articulated object interaction in unknown scenes with whole-body mobile manipu- lation,
M. Mittal, D. Hoeller, F. Farshidian, M. Hutter, and A. Garg, “Articulated object interaction in unknown scenes with whole-body mobile manipu- lation,” inIEEE/RSJ international conference on intelligent robots and systems (IROS), 2022
2022
-
[23]
Perceptive model predictive control for con- tinuous mobile manipulation,
J. Pankert and M. Hutter, “Perceptive model predictive control for con- tinuous mobile manipulation,”IEEE Robotics and Automation Letters, 2020
2020
-
[24]
A collision-free mpc for whole-body dynamic locomotion and manipula- tion,
J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipula- tion,” inInternational Conference on Robotics and Automation (ICRA), 2022
2022
-
[25]
Whole-body model predictive control for mobile manipulation with task priority transition,
Y . Wang, R. Chen, and M. Zhao, “Whole-body model predictive control for mobile manipulation with task priority transition,” inIEEE International Conference on Robotics and Automation (ICRA), 2025
2025
-
[26]
Versatile multicontact planning and control for legged loco-manipulation,
J.-P. Sleiman, F. Farshidian, and M. Hutter, “Versatile multicontact planning and control for legged loco-manipulation,”Science Robotics, 2023
2023
-
[27]
Active-perceptive motion generation for mobile manipulation,
S. Jauhri, S. Lueth, and G. Chalvatzaki, “Active-perceptive motion generation for mobile manipulation,” inIEEE International Conference on Robotics and Automation (ICRA), 2024
2024
-
[28]
Multi-skill mobile manipula- tion for object rearrangement,
J. Gu, D. S. Chaplot, H. Su, and J. Malik, “Multi-skill mobile manipula- tion for object rearrangement,” inInternational Conference on Learning Representations (ICLR), 2023
2023
-
[29]
N 2m2: Learning navigation for arbitrary mobile manipulation motions in unseen and dynamic environments,
D. Honerkamp, T. Welschehold, and A. Valada, “N 2m2: Learning navigation for arbitrary mobile manipulation motions in unseen and dynamic environments,”IEEE Transactions on Robotics, 2023
2023
-
[30]
Adaptive mobile manipulation for articulated objects in the open world,
H. Xiong, R. Mendonca, K. Shaw, and D. Pathak, “Adaptive mobile manipulation for articulated objects in the open world,”arXiv preprint arXiv:2401.14403, 2024
-
[31]
Whole-body end- effector pose tracking,
T. Portela, A. Cramariuc, M. Mittal, and M. Hutter, “Whole-body end- effector pose tracking,” inIEEE International Conference on Robotics and Automation (ICRA), 2025
2025
-
[32]
Learning a diffusion model policy from rewards via q-score matching,
M. Psenka, A. Escontrela, P. Abbeel, and Y . Ma, “Learning a diffusion model policy from rewards via q-score matching,” inInternational Conference on Machine Learning, ser. Machine Learning Research. PMLR, 2024
2024
-
[33]
Bayesian imitation learning for end-to-end mobile manipulation,
Y . Du, D. Ho, A. Alemi, E. Jang, and M. Khansari, “Bayesian imitation learning for end-to-end mobile manipulation,” inInternational Confer- ence on Machine Learning. PMLR, 2022
2022
-
[34]
Momanipvla: Transfer- ring vision-language-action models for general mobile manipulation,
Z. Wu, Y . Zhou, X. Xu, Z. Wang, and H. Yan, “Momanipvla: Transfer- ring vision-language-action models for general mobile manipulation,” in Computer Vision and Pattern Recognition Conference, 2025
2025
-
[35]
AC-dit: Adaptive coordination diffusion transformer for mobile manipulation,
S. Chen, J. Liu, S. Qian, H. Jiang, Z. Liu, C. Gu, X. Li, C. Hou, P. Wang, Z. Wang, R. Zhang, and S. Zhang, “AC-dit: Adaptive coordination diffusion transformer for mobile manipulation,” inAnnual Conference on Neural Information Processing Systems, 2025
2025
-
[36]
Deep imitation learning for humanoid loco-manipulation through human teleoperation,
M. Seo, S. Han, K. Sim, S. H. Bang, C. Gonzalez, L. Sentis, and Y . Zhu, “Deep imitation learning for humanoid loco-manipulation through human teleoperation,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2023
2023
-
[37]
UMI on legs: Making manipulation policies mobile with manipulation-centric whole-body controllers,
H. Ha, Y . Gao, Z. Fu, J. Tan, and S. Song, “UMI on legs: Making manipulation policies mobile with manipulation-centric whole-body controllers,” inConference on Robot Learning, 2024
2024
-
[38]
The role of embodiment in intuitive whole- body teleoperation for mobile manipulation,
S. B. Moyen, R. Krohn, S. Lueth, K. Pompetzki, J. Peters, V . Prasad, and G. Chalvatzaki, “The role of embodiment in intuitive whole- body teleoperation for mobile manipulation,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2025
2025
-
[39]
Whole-body teleoperation for mobile manipulation at zero added cost,
D. Honerkamp, H. Mahesheka, J. O. von Hartz, T. Welschehold, and A. Valada, “Whole-body teleoperation for mobile manipulation at zero added cost,”IEEE Robotics and Automation Letters, 2025
2025
-
[40]
Opt2skill: Imitating dynamically-feasible whole- body trajectories for versatile humanoid loco-manipulation,
F. Liu, Z. Gu, Y . Cai, Z. Zhou, H. Jung, J. Jang, S. Zhao, S. Ha, Y . Chen, D. Xu, and Y . Zhao, “Opt2skill: Imitating dynamically-feasible whole- body trajectories for versatile humanoid loco-manipulation,”IEEE Robotics and Automation Letters, 2025
2025
-
[41]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020
work page internal anchor Pith review arXiv 2005
-
[42]
Is value learning really the main bottleneck in offline rl?
S. Park, K. Frans, S. Levine, and A. Kumar, “Is value learning really the main bottleneck in offline rl?”Advances in Neural Information Processing Systems, 2024
2024
-
[43]
Offline reinforcement learning with implicit q-learning,
I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” inInternational Conference on Learning Representations, 2022
2022
-
[44]
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-weighted regression: Simple and scalable off-policy reinforcement learning,”arXiv preprint arXiv:1910.00177, 2019
work page internal anchor Pith review arXiv 1910
-
[45]
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
P. Hansen-Estruch, I. Kostrikov, M. Janner, J. G. Kuba, and S. Levine, “Idql: Implicit q-learning as an actor-critic method with diffusion policies,”arXiv preprint arXiv:2304.10573, 2023
work page internal anchor Pith review arXiv 2023
-
[46]
Using non-expert data to robustify imitation learning via offline rein- forcement learning,
K. Huang, R. Scalise, C. Winston, A. Agrawal, Y . Zhang, R. Baijal, M. Grotz, B. Boots, B. Burchfiel, M. Itkina, P. Shah, and A. Gupta, “Using non-expert data to robustify imitation learning via offline rein- forcement learning,”Under Review, 2025
2025
-
[47]
Sprinql: Sub-optimal demon- strations driven offline imitation learning,
H. Hoang, T. Mai, and P. Varakantham, “Sprinql: Sub-optimal demon- strations driven offline imitation learning,”Advances in Neural Informa- tion Processing Systems, 2024
2024
-
[48]
Learning multimodal behaviors from scratch with diffusion policy gradient,
S. Li, R. Krohn, T. Chen, A. Ajay, P. Agrawal, and G. Chalvatzaki, “Learning multimodal behaviors from scratch with diffusion policy gradient,”Advances in Neural Information Processing Systems, 2024
2024
-
[49]
Reinforcement learning with action chunking,
Q. Li, Z. Zhou, and S. Levine, “Reinforcement learning with action chunking,” inAnnual Conference on Neural Information Processing Systems, 2025
2025
-
[50]
D. Tian, O. Celik, and G. Neumann, “Chunking the critic: A transformer-based soft actor-critic with n-step returns,”arXiv preprint arXiv:2503.03660, 2025
-
[51]
TOP- ERL: Transformer-based off-policy episodic reinforcement learning,
G. Li, D. Tian, H. Zhou, X. Jiang, R. Lioutikov, and G. Neumann, “TOP- ERL: Transformer-based off-policy episodic reinforcement learning,” in International Conference on Learning Representations, 2025. 13
2025
-
[52]
Implementing torque control with high-ratio gear boxes and without joint-torque sensors,
A. D. Prete, N. Mansard, O. E. Ramos, O. Stasse, and F. Nori, “Implementing torque control with high-ratio gear boxes and without joint-torque sensors,” inInt. Journal of Humanoid Robotics, 2016
2016
-
[53]
Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,
H. Geng, H. Xu, C. Zhao, C. Xu, L. Yi, S. Huang, and H. Wang, “Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2023
-
[54]
Addressing function approxi- mation error in actor-critic methods,
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational Conference on Machine Learning, 2018
2018
-
[55]
Iterative corresponding geometry: Fusing region and depth for highly efficient 3d tracking of textureless objects,
M. Stoiber, M. Sundermeyer, and R. Triebel, “Iterative corresponding geometry: Fusing region and depth for highly efficient 3d tracking of textureless objects,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.