Dynamic Execution Horizon Prediction for Chunk-based Robot Policies

Al\'an Aspuru-Guzik; Animesh Garg; Arjun Sohal; Florian Shkurti; Kourosh Darvish; Liyu Tao; Miroslav Bogdanovic; Yuchi Zhao

arxiv: 2606.11408 · v1 · pith:LJHWDI4Tnew · submitted 2026-06-09 · 💻 cs.RO

Dynamic Execution Horizon Prediction for Chunk-based Robot Policies

Yuchi Zhao , Miroslav Bogdanovic , Arjun Sohal , Liyu Tao , Kourosh Darvish , Al\'an Aspuru-Guzik , Florian Shkurti , Animesh Garg This is my paper

Pith reviewed 2026-06-27 12:46 UTC · model grok-4.3

classification 💻 cs.RO

keywords action chunkingrobot manipulationdynamic horizon predictionreinforcement learningpolicy adaptationopen-loop executionfine-grained tasks

0 comments

The pith

Dynamic Execution Horizon Prediction adapts chunk execution lengths on frozen policies to raise success on precise robot tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fixed execution horizons in action-chunking robot policies limit performance on fine-grained tasks because they force open-loop behavior during each chunk. It introduces Dynamic Execution Horizon Prediction, a lightweight branch trained with online reinforcement learning while the base chunk policy remains completely frozen. This branch learns to output variable horizons that shorten during precision stages and lengthen during free-space motion. A sympathetic reader would care because the method improves task success without retraining or inspecting the original policy, offering a modular way to add reactivity to existing chunk-based systems.

Core claim

Dynamic Execution Horizon Prediction (DEHP) trains a lightweight execution-horizon prediction branch using online reinforcement learning while keeping the pretrained chunk policy completely frozen. This makes the method compatible with black-box chunk policies and isolates the effect of adapting the execution horizon from changes to the underlying action generator. DEHP predicts shorter execution horizons during fine-grained stages of the task and longer horizons during free-space motion, balancing the efficiency of open-loop chunk execution with the reactivity of closed-loop single-step control and improving success rates on high-precision and long-horizon manipulation tasks.

What carries the argument

Lightweight execution-horizon prediction branch trained with online RL on a frozen pretrained chunk policy.

If this is right

DEHP applies to any black-box chunk policy without internal changes or retraining.
The predictor selects shorter horizons for fine manipulation and longer ones for free-space motion.
Success rates rise substantially across the tested high-precision and long-horizon tasks.
The separation of horizon prediction from action generation keeps the base policy unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight-branch idea could be tested on other policy outputs such as termination signals or uncertainty estimates.
Task-specific horizon tuning might become unnecessary if the RL branch generalizes across related manipulation skills.
Adding the branch after policy training could serve as a low-cost way to retrofit older chunk policies for more reactive use.

Load-bearing premise

A lightweight horizon-prediction branch trained with online RL on a completely frozen pretrained chunk policy can reliably learn task-stage-appropriate horizons without access to or modification of the base policy internals.

What would settle it

Running the same evaluations on high-precision and long-horizon tasks and finding either no rise in success rate or no systematic shortening of horizons during fine-grained stages would falsify the central claim.

read the original abstract

Action chunking has become a standard design in modern robot policies, from diffusion/flow policies to vision-language-action models, where the policy predicts a sequence of actions and executes a fixed number of them instead of acting one step at a time. However, this paradigm relies on a key assumption: a fixed execution horizon. During chunk execution, the policy operates open-loop, which is particularly problematic for fine-grained manipulation tasks that require frequent replanning. In practice, the execution horizon is typically chosen through empirical tuning and is highly task-dependent. To this end, we propose Dynamic Execution Horizon Prediction (DEHP), an effective method that trains a lightweight execution-horizon prediction branch using online reinforcement learning while keeping the pretrained chunk policy completely frozen. This makes the method compatible with black-box chunk policies and isolates the effect of adapting the execution horizon from changes to the underlying action generator. Across our evaluations, DEHP improves the success rate of different high-precision and long-horizon manipulation tasks by a large margin. Our qualitative analysis further shows that DEHP predicts shorter execution horizons during fine-grained stages of the task and longer horizons during free-space motion. In this way, DEHP balances the efficiency of open-loop chunk execution with the reactivity of closed-loop single-step control. Project page: https://dehp-chunking.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DEHP isolates horizon prediction in a separate RL branch on a frozen chunk policy, which is a clean modular idea, but the abstract supplies zero experimental evidence so the large-margin claims cannot be assessed.

read the letter

The main point is that the paper trains a lightweight horizon-prediction branch with online RL while leaving the pretrained chunk policy completely frozen. This design keeps the base action generator untouched and works with any black-box chunk policy, which is the concrete novelty relative to prior fixed-horizon chunking work.

It does one thing cleanly: it separates the decision of how many steps to execute from the generation of those steps. The qualitative claim that the branch learns shorter horizons in fine-grained phases and longer ones in free-space motion is consistent with the stated goal of balancing open-loop efficiency and closed-loop reactivity.

The soft spot is the total lack of experimental substance. The abstract asserts large-margin success-rate gains on high-precision and long-horizon tasks but gives no numbers, baselines, ablations, or statistical tests. Without those, there is no way to know whether the reported improvement actually comes from stage-appropriate horizon choices or from some other factor. The concern that a reward-only signal on a frozen policy may be too weak to discover fine-grained distinctions is reasonable given how sparse manipulation rewards usually are.

This is for people already running chunk-based policies (diffusion, flow, or VLA) who want a lightweight way to add variable horizons without retraining the whole model. A reader looking for practical deployment tricks might pick up the modular pattern, but the current text is too thin to evaluate the actual performance.

I would send it to peer review if the full paper contains proper experiments with baselines and ablations; otherwise it needs more data before review.

Referee Report

2 major / 1 minor

Summary. The paper proposes Dynamic Execution Horizon Prediction (DEHP), which adds a lightweight horizon-prediction branch trained via online RL to a frozen pretrained chunk-based robot policy. The central claim is that this yields large-margin success-rate gains on high-precision and long-horizon manipulation tasks by dynamically selecting shorter execution horizons during fine-grained stages and longer horizons during free-space motion, thereby balancing open-loop chunk efficiency with closed-loop reactivity.

Significance. If the experimental results hold, the approach would be a practical contribution for adapting black-box chunk policies (diffusion, flow, or VLA models) without internal modification or retraining. The isolation of the horizon branch and the reported stage-dependent behavior address a real limitation of fixed-horizon chunking in contact-rich tasks.

major comments (2)

[Abstract] Abstract: the claim that DEHP 'improves the success rate ... by a large margin' is presented without any quantitative results, baselines, trial counts, statistical tests, or ablation studies. This is load-bearing for the central claim and prevents assessment of whether gains are attributable to dynamic horizons rather than other factors.
[Method] Method / Training description: the horizon branch is trained solely with downstream task reward while the chunk policy remains completely frozen and inaccessible. Given that manipulation rewards are typically sparse and delayed, it is unclear how the branch can reliably discover the fine-grained vs. free-space distinction asserted in the qualitative analysis; no auxiliary losses, feature access, or shaping are described that would supply the necessary signal.

minor comments (1)

[Abstract] The abstract and introduction would benefit from a concise statement of the specific tasks, robot platforms, and base policies used in the evaluations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that DEHP 'improves the success rate ... by a large margin' is presented without any quantitative results, baselines, trial counts, statistical tests, or ablation studies. This is load-bearing for the central claim and prevents assessment of whether gains are attributable to dynamic horizons rather than other factors.

Authors: We agree that the abstract lacks the quantitative details needed to support the central claim. In the revised manuscript we will incorporate specific success-rate deltas, trial counts, baseline comparisons, and references to the statistical tests and ablations already present in the experimental section. revision: yes
Referee: [Method] Method / Training description: the horizon branch is trained solely with downstream task reward while the chunk policy remains completely frozen and inaccessible. Given that manipulation rewards are typically sparse and delayed, it is unclear how the branch can reliably discover the fine-grained vs. free-space distinction asserted in the qualitative analysis; no auxiliary losses, feature access, or shaping are described that would supply the necessary signal.

Authors: The horizon branch is trained exclusively with the downstream task reward while the chunk policy stays frozen, exactly as described. The fine-grained versus free-space behavior emerges from the online RL objective: horizon choices that improve task completion receive higher return, and the qualitative results confirm the learned policy exhibits the desired stage-dependent pattern. We will expand the method section with additional discussion of the reward signal and training dynamics to clarify this point. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical RL training on frozen policy with external task rewards

full rationale

The paper introduces DEHP as a lightweight horizon-prediction branch trained via online RL on a completely frozen pretrained chunk policy. The central claim of success-rate gains rests on empirical evaluations across tasks rather than any mathematical derivation or self-referential definition. No equations are presented that equate a 'prediction' to a fitted input by construction, and no load-bearing self-citations or uniqueness theorems reduce the method to its own inputs. The training signal (downstream task reward) is external to the branch's output, satisfying the non-circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the approach relies on standard RL training assumptions and the domain premise that variable horizons are beneficial.

axioms (1)

domain assumption Fixed execution horizons are suboptimal for fine-grained manipulation tasks requiring replanning
Core motivation stated in the first paragraph of the abstract.

pith-pipeline@v0.9.1-grok · 5792 in / 1088 out tokens · 22710 ms · 2026-06-27T12:46:54.159087+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 9 linked inside Pith

[1]

From imitation to refinement – residual rl for precise assembly, 2024.https://arxiv.org/abs/2407.16677

Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, and Pulkit Agrawal. From imitation to refinement – residual rl for precise assembly, 2024.https://arxiv.org/abs/2407.16677

arXiv 2024
[2]

A distributional perspective on reinforcement learning

Marc G Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In International conference on machine learning, pages 449–458. Pmlr, 2017

2017
[3]

https://arxiv.org/abs/2410.24164

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...

Pith/arXiv arXiv 2026
[4]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

2024
[5]

Stop regressing: Training value functions via classification for scalable deep rl

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. InInternational Conference on Machine Learning, pages 13049–13071. PMLR, 2024

2024
[6]

Minho Heo, Youngwoon Lee, Doohyun Lee, and Joseph J. Lim. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation. InRobotics: Science and Systems, 2023

2023
[7]

Improving regression performance with distributional losses

Ehsan Imani and Martha White. Improving regression performance with distributional losses. InInternational conference on machine learning, pages 2157–2166. PMLR, 2018

2018
[8]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...

Pith/arXiv arXiv 2025
[9]

Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025

Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, and Mingyu Ding. Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025

Pith/arXiv arXiv 2025
[10]

Reinforcement learning with action chunking

Qiyang Li, Zhiyuan Zhou, and Sergey Levine. Reinforcement learning with action chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=XUks 1Y96NR

2025
[11]

Decoupled q-chunking

Qiyang Li, Seohong Park, and Sergey Levine. Decoupled q-chunking. InThe Fourteenth International Conference on Learning Representations, 2026.https://openreview.net/forum?id=aqGNdZQL9l

2026
[12]

Adaptive action chunking at inference-time for vision-language-action models

Yuanchang Liang, Xiaobo Wang, Kai Wang, Shuo Wang, Xiaojiang Peng, Haoyu Chen, David Kim Huat Chua, and Vadakkepat Prahlad. Adaptive action chunking at inference-time for vision-language-action models. InCVPR, 2026

2026
[13]

RDT-1b: a diffusion foundation model for bimanual manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=yAzN4tz7oI

2025
[14]

Bidirectional decoding: Improving action chunking via guided test-time sampling

Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Max Du, and Chelsea Finn. Bidirectional decoding: Improving action chunking via guided test-time sampling. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=qZmn2hkuzw

2025
[15]

Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M. G...

Pith/arXiv arXiv 2025
[16]

GR00T N1: An open foundation model for generalist humanoid robots

NVIDIA, Johan Bjorck, Nikita Cherniadev Fernando Castañeda, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, You L...

2025
[17]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Tobias Kreiman, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and...

2024
[18]

Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz

Allen Z. Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=mEpqHvbD2h

2025
[19]

Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347

Pith/arXiv arXiv 2017
[20]

Improving generative behavior cloning via self-guidance and adaptive chunking

Junhyuk So, Chiwoong Lee, Shinyoung Lee, Jungseul Ok, and Eunhyeok Park. Improving generative behavior cloning via self-guidance and adaptive chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=GctsZXLCpl

2025
[21]

Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999

1999
[22]

A careful examination of large behavior models for multitask dexterous manipulation

TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching- Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake W...

Pith/arXiv arXiv 2025
[23]

Steering your diffusion policy with latent space reinforcement learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning. Conference on Robot Learning, 2025

2025
[24]

Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421

Yueyang Weng, Xiaopeng Zhang, Yongjin Mu, Yingcong Zhu, Yanjie Li, and Qi Liu. Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421

Pith/arXiv arXiv 2025
[25]

Self-improving vision-language-action models with data generation via residual rl, 2025.https://arxiv.org/abs/2511.00091

Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Guanya Shi, and Yuke Zhu. Self-improving vision-language-action models with data generation via residual rl, 2025.https://arxiv.org/abs/2511.00091

arXiv 2025
[26]

Hipolicy: Hierarchical multi-frequency action chunking for policy learning.arXiv preprint arXiv:2604.06067, 2026

Jiyao Zhang, Zimu Han, Junhan Wang, Xionghao Wu, Shihong Lin, Jinzhou Li, Hongwei Fan, Ruihai Wu, 11 Dongjiang Li, and Hao Dong. Hipolicy: Hierarchical multi-frequency action chunking for policy learning.arXiv preprint arXiv:2604.06067, 2026

Pith/arXiv arXiv 2026
[27]

Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz

Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz. Action chunking and exploratory data collection yield exponential improvements in behavior cloning for continuous control, 2025. https://arxiv.org/abs/2507.09061

arXiv 2025
[28]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023.https://arxiv.org/abs/2304.13705. 12 A Appendix A.1 Return invariance Let π be a chunking policy with execution horizonshk ∈ { 1, . . . , H}, and let the chunk start times be t0 = 0and tk+1 = tk + hk. With the within-chunk ...

Pith/arXiv arXiv 2023

[1] [1]

From imitation to refinement – residual rl for precise assembly, 2024.https://arxiv.org/abs/2407.16677

Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, and Pulkit Agrawal. From imitation to refinement – residual rl for precise assembly, 2024.https://arxiv.org/abs/2407.16677

arXiv 2024

[2] [2]

A distributional perspective on reinforcement learning

Marc G Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In International conference on machine learning, pages 449–458. Pmlr, 2017

2017

[3] [3]

https://arxiv.org/abs/2410.24164

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...

Pith/arXiv arXiv 2026

[4] [4]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

2024

[5] [5]

Stop regressing: Training value functions via classification for scalable deep rl

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. InInternational Conference on Machine Learning, pages 13049–13071. PMLR, 2024

2024

[6] [6]

Minho Heo, Youngwoon Lee, Doohyun Lee, and Joseph J. Lim. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation. InRobotics: Science and Systems, 2023

2023

[7] [7]

Improving regression performance with distributional losses

Ehsan Imani and Martha White. Improving regression performance with distributional losses. InInternational conference on machine learning, pages 2157–2166. PMLR, 2018

2018

[8] [8]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...

Pith/arXiv arXiv 2025

[9] [9]

Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025

Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, and Mingyu Ding. Mixture of horizons in action chunking.arXiv preprint arXiv:2511.19433, 2025

Pith/arXiv arXiv 2025

[10] [10]

Reinforcement learning with action chunking

Qiyang Li, Zhiyuan Zhou, and Sergey Levine. Reinforcement learning with action chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=XUks 1Y96NR

2025

[11] [11]

Decoupled q-chunking

Qiyang Li, Seohong Park, and Sergey Levine. Decoupled q-chunking. InThe Fourteenth International Conference on Learning Representations, 2026.https://openreview.net/forum?id=aqGNdZQL9l

2026

[12] [12]

Adaptive action chunking at inference-time for vision-language-action models

Yuanchang Liang, Xiaobo Wang, Kai Wang, Shuo Wang, Xiaojiang Peng, Haoyu Chen, David Kim Huat Chua, and Vadakkepat Prahlad. Adaptive action chunking at inference-time for vision-language-action models. InCVPR, 2026

2026

[13] [13]

RDT-1b: a diffusion foundation model for bimanual manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=yAzN4tz7oI

2025

[14] [14]

Bidirectional decoding: Improving action chunking via guided test-time sampling

Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Max Du, and Chelsea Finn. Bidirectional decoding: Improving action chunking via guided test-time sampling. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=qZmn2hkuzw

2025

[15] [15]

Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich, Yijie Guo, M. G...

Pith/arXiv arXiv 2025

[16] [16]

GR00T N1: An open foundation model for generalist humanoid robots

NVIDIA, Johan Bjorck, Nikita Cherniadev Fernando Castañeda, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, You L...

2025

[17] [17]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Tobias Kreiman, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and...

2024

[18] [18]

Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz

Allen Z. Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. InThe Thirteenth International Conference on Learning Representations, 2025.https://openreview.net/forum?id=mEpqHvbD2h

2025

[19] [19]

Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017.https://arxiv.org/abs/1707.06347

Pith/arXiv arXiv 2017

[20] [20]

Improving generative behavior cloning via self-guidance and adaptive chunking

Junhyuk So, Chiwoong Lee, Shinyoung Lee, Jungseul Ok, and Eunhyeok Park. Improving generative behavior cloning via self-guidance and adaptive chunking. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025.https://openreview.net/forum?id=GctsZXLCpl

2025

[21] [21]

Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181–211, 1999

1999

[22] [22]

A careful examination of large behavior models for multitask dexterous manipulation

TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching- Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake W...

Pith/arXiv arXiv 2025

[23] [23]

Steering your diffusion policy with latent space reinforcement learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning. Conference on Robot Learning, 2025

2025

[24] [24]

Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421

Yueyang Weng, Xiaopeng Zhang, Yongjin Mu, Yingcong Zhu, Yanjie Li, and Qi Liu. Temporal action selection for action chunking, 2025.https://arxiv.org/abs/2511.04421

Pith/arXiv arXiv 2025

[25] [25]

Self-improving vision-language-action models with data generation via residual rl, 2025.https://arxiv.org/abs/2511.00091

Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Guanya Shi, and Yuke Zhu. Self-improving vision-language-action models with data generation via residual rl, 2025.https://arxiv.org/abs/2511.00091

arXiv 2025

[26] [26]

Hipolicy: Hierarchical multi-frequency action chunking for policy learning.arXiv preprint arXiv:2604.06067, 2026

Jiyao Zhang, Zimu Han, Junhan Wang, Xionghao Wu, Shihong Lin, Jinzhou Li, Hongwei Fan, Ruihai Wu, 11 Dongjiang Li, and Hao Dong. Hipolicy: Hierarchical multi-frequency action chunking for policy learning.arXiv preprint arXiv:2604.06067, 2026

Pith/arXiv arXiv 2026

[27] [27]

Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz

Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, and Max Simchowitz. Action chunking and exploratory data collection yield exponential improvements in behavior cloning for continuous control, 2025. https://arxiv.org/abs/2507.09061

arXiv 2025

[28] [28]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023.https://arxiv.org/abs/2304.13705. 12 A Appendix A.1 Return invariance Let π be a chunking policy with execution horizonshk ∈ { 1, . . . , H}, and let the chunk start times be t0 = 0and tk+1 = tk + hk. With the within-chunk ...

Pith/arXiv arXiv 2023