PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking

(2) JD Explore Academy); Chenghao Liu (1); Jiachen Zhang (1); Jiayi Li (2); Junnan Nie (1); Junyi Lao (1); Songfang Huang (1) ((1) Peking University; Tianle Zhang (2)

arxiv: 2606.00537 · v1 · pith:SKBGTECHnew · submitted 2026-05-30 · 💻 cs.RO

PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking

Junnan Nie (1) , Jiayi Li (2) , Jiachen Zhang (1) , Junyi Lao (1) , Chenghao Liu (1) , Tianle Zhang (2) , Songfang Huang (1) ((1) Peking University , (2) JD Explore Academy) This is my paper

Pith reviewed 2026-06-28 18:53 UTC · model grok-4.3

classification 💻 cs.RO

keywords action chunkingrobot policiesexecution horizonphase detectionmanipulation trajectoriestest-time adaptation

0 comments

The pith

Robot policies achieve higher success by selecting execution horizons at low-speed points in predicted action chunks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Action chunking lets robot policies predict sequences of future actions but still requires choosing how much of each chunk to execute before querying the policy again with a fresh observation. Fixed execution horizons produce inconsistent results because the right length varies with task and motion phase. PACE detects low-speed transition points directly in the predicted chunk's speed profile and treats those points as replanning boundaries. The method needs no retraining and raises measured success rates in both large-scale simulation and physical robot tests.

Core claim

The paper claims that identifying low-speed transition points in the predicted speed profile of an action chunk and using them as execution horizons improves policy performance by adapting to the phase-dependent kinematic structure of manipulation trajectories.

What carries the argument

Low-speed transition points in the predicted speed profile, used as replanning boundaries.

If this is right

Average success rate across 50 simulation tasks rises from 57.8 percent to 64.2 percent.
Average real-robot success rate rises from 50.7 percent to 70.4 percent.
Execution length shortens near detected transitions and lengthens during steady motion phases.
The selection rule applies to any existing chunking policy without retraining or internal access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Phase detection from predicted motion profiles could extend to other sequential prediction settings that contain natural pause points.
Combining the selection rule with policy training that shapes speed profiles might produce further gains.
The approach may show different reliability on tasks whose trajectories lack clear low-speed transitions.

Load-bearing premise

Low-speed points detected in the predicted chunk mark suitable places to stop executing and query the policy again.

What would settle it

A side-by-side test on the same tasks and policies where fixed-horizon execution matches or exceeds the success rates obtained by stopping at the detected low-speed points.

Figures

Figures reproduced from arXiv: 2606.00537 by (2) JD Explore Academy), Chenghao Liu (1), Jiachen Zhang (1), Jiayi Li (2), Junnan Nie (1), Junyi Lao (1), Songfang Huang (1) ((1) Peking University, Tianle Zhang (2).

**Figure 1.** Figure 1: Overview of PACE. Fixed horizons can be unreliable because success varies with H across tasks. PACE selects the executed prefix online from each predicted chunk, improving performance in simulation and real-robot experiments while adapting the horizon in a rollout. †Corresponding author. Emails: jnnie25@stu.pku.edu.cn, 23121254@bjtu.edu.cn, z89498323286@gmail.com, jylao25@stu.pku.edu.cn, chliu@stu.pku.edu.… view at source ↗

**Figure 2.** Figure 2: PACE framework. PACE selects an execution horizon from low-speed valleys in the predicted chunk’s smoothed speed profile. The robot executes the selected prefix, discards the suffix, and then queries the policy again. chunk execution [2, 23]. Another line modifies training objectives or model design so that policies can reason over multiple horizons [11]. Closest to our setting, AutoHorizon selects execu… view at source ↗

**Figure 3.** Figure 3: Rollout-level behavior of PACE. The rollout is from place_shoe. Top: head and front camera observations at the six replanning timesteps. Middle: selected execution horizons between consecutive queries. Bottom: predicted action chunks along the rollout timeline, where solid segments are executed prefixes and dashed segments are discarded suffixes. Vertical dashed lines mark replanning boundaries. substantia… view at source ↗

**Figure 4.** Figure 4: PACE compared with fixed-horizon sweeps. Green curves show the seed-0 diagnostic sweep of fixed-horizon execution as H is varied from 1 to 50 on each task. The red star marks PACE under the full three-seed evaluation: its horizontal coordinate is the mean executed horizon averaged over policy queries, and its vertical coordinate is the PACE success rate. PACE is shown as a point only for visualization and … view at source ↗

**Figure 5.** Figure 5: Successful real-robot rollout on stack_bowls. Blue markers indicate policy-query timesteps, and green labels indicate the execution horizon selected between consecutive queries. PACE selects long horizons during approach and transport, shortens the horizon near contact-sensitive stacking alignment, and expands it again once a coherent motion segment becomes available [PITH_FULL_IMAGE:figures/full_fig_p008… view at source ↗

**Figure 6.** Figure 6: Failure case on put_pen_into_pencil_case. The rollout is from the ALOHA robot. Blue markers indicate policy-query timesteps, and green labels indicate selected horizon lengths. The pencil case remains only partially opened while the right arm moves the pen toward it, showing a failure of the base policy rather than feedback timing. 5.3 Failure Case and Scope of Test-Time Execution Control [PITH_FULL_IMAGE… view at source ↗

**Figure 7.** Figure 7: Expanded training prediction horizon ablation. Each cell shows the success-rate gain of training horizon Htrain (columns) relative to the shortest feasible training horizon for a given evaluation horizon Heval (rows), i.e., Htrain = Heval. Blue indicates a gain, orange a loss; the diagonal is zero by construction. Cells below the diagonal are infeasible and left blank. episodes. Repeating the full sweep fo… view at source ↗

**Figure 8.** Figure 8: Initial frames of the RoboChallenge tasks. Left: put_pen_into_pencil_case. Right: stack_bowls. Both tasks are evaluated on an ALOHA robot using the same fine-tuned checkpoint within each task; only the test-time execution rule differs between the baseline and PACE. For place_object_on_plate, the Franka robot must pick up a specified object and place it fully inside a target plate. We use five object varian… view at source ↗

**Figure 9.** Figure 9: In-lab place_object_on_plate setup. The task uses five objects—corn, cabbage, green pepper, red pepper, and garlic—and a fixed target plate. A single fine-tuned π0.5 checkpoint is used across all object variants, and each method is evaluated over 5 × 20 real-robot trials. put_pen_into_pencil_case. The task is to place the pen into the pencil case. In our evaluation setup and rollout videos, the left grippe… view at source ↗

**Figure 10.** Figure 10: PACE rollout visualization on place_can_basket. The selected execution horizon adapts to the local phase structure of the rollout, with shorter prefixes near the placement transition and longer prefixes during smooth motion segments. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: PACE rollout visualization on hanging_mug. The execution horizon is shortened around the task transition that requires precise alignment and interaction, and lengthened during more stable motion segments. scan_object. This rollout highlights a task with a more gradual motion structure. PACE continues to adapt the execution horizon online, using longer prefixes during steady motion and shorter prefixes whe… view at source ↗

**Figure 12.** Figure 12: PACE rollout visualization on scan_object. PACE selects execution horizons from the predicted kinematic profile, allowing the rollout to keep longer open-loop segments when motion is smooth and to replan earlier around phase transitions. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: PACE rollout visualization on RoboChallenge stack_bowls. The selected execution horizon is longer during smooth approach and transport, and shorter near the stacking phase, reflecting the same phase-aware replanning rule used throughout the paper. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

read the original abstract

Recent vision-language-action and diffusion-based robot policies often use action chunking, where each policy query predicts a sequence of future actions and the robot executes an open-loop prefix before re-querying. While this interface improves local motion continuity, deployment still requires choosing the execution horizon: how much of each predicted chunk should be executed before acquiring a new observation. However, our experiments show that success is strongly task-dependent and non-monotonic with respect to the execution horizon, making a single constant horizon an unreliable deployment rule. We propose PACE (Phase-Aware Chunk Execution), a training-free test-time execution method that selects the execution horizon online from the predicted chunk itself. PACE exploits the phase-dependent kinematic structure of manipulation trajectories by identifying low-speed transition points in the predicted speed profile and using them as candidate replanning boundaries. Because PACE uses only the predicted action chunk, it is plug-and-play and requires no retraining or access to policy internals. We validate PACE through large-scale evaluations in both simulation and real-robot settings. On 50 RoboTwin2.0 tasks, PACE raises the average success rate from 57.8% to 64.2%. In real-robot experiments on bimanual ALOHA and single-arm Franka platforms, PACE improves the average task score from 60.7 to 77.7 and the average success rate from 50.7% to 70.4%. Ablations and rollout-level analyses show that PACE adapts execution horizons across manipulation phases, shortening near transitions while preserving longer execution during coherent motion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PACE gives a simple training-free rule for picking variable execution horizons in action-chunking policies and reports solid gains on sim and real-robot benchmarks, but the justification for why low-speed points work is thin.

read the letter

PACE is a test-time method that chooses how much of a predicted action chunk to execute by looking for low-speed points in the speed profile of that chunk. The reported results show it improves average success from 57.8% to 64.2% across 50 RoboTwin2.0 tasks in simulation and from 50.7% to 70.4% on real robots with ALOHA and Franka arms.

The new part is using those low-speed transition points as replanning boundaries without any training or extra models. It makes sense as a way to handle the fact that fixed horizons don't work the same for every task or phase.

The paper does well by testing on a large number of tasks and showing real-robot results. It also includes ablations that suggest the method shortens execution near transitions and keeps longer runs during steady motion. Being training-free and only needing the chunk output is a practical plus for people already using chunking policies.

The main soft spot is the link between low-speed points and actual phase changes. The abstract invokes this kinematic structure but gives no specifics on the detection method or checks against ground-truth phases. There's also no test showing that other variable horizon rules wouldn't give similar gains, so the particular choice might not be the key. The abstract doesn't mention error bars or variance across runs, which leaves the size of the improvement a bit unclear.

Overall this is aimed at practitioners and researchers deploying vision-language-action or diffusion policies on physical robots. Anyone looking for simple ways to improve chunked execution without retraining would find it useful.

The work shows clear thinking on the deployment problem and engages with the literature on action chunking. It deserves a serious referee even though the phase assumption could use more backing.

I would recommend sending it to peer review.

Referee Report

3 major / 2 minor

Summary. The paper proposes PACE, a training-free test-time method for action-chunking robot policies that selects variable execution horizons by identifying low-speed transition points in the open-loop predicted speed profile of each chunk. It claims this exploits phase-dependent kinematic structure in manipulation trajectories, yielding average success-rate gains from 57.8% to 64.2% across 50 RoboTwin2.0 tasks and from 50.7% to 70.4% on real bimanual ALOHA and single-arm Franka platforms, with ablations showing adaptation across phases.

Significance. If the core assumption holds and is properly validated, the result would be significant for deployment of existing chunking policies: it supplies a simple, plug-and-play rule that avoids the task-dependent non-monotonicity of fixed horizons without retraining or policy internals. The scale of the evaluation (50 simulation tasks plus real-robot platforms) and the inclusion of rollout-level analyses are strengths that would support broader adoption if the phase-transition mapping is shown to be reliable rather than an artifact of prediction noise.

major comments (3)

[Method] Method section: the exact detection rule for 'low-speed transition points' (speed threshold, local-minima criterion, derivative sign change, or smoothing parameters) is never stated. This is load-bearing for the central claim, because the reported gains are attributed specifically to these points marking appropriate replanning boundaries rather than to any variable-horizon schedule.
[Experiments / Ablations] Experiments / Ablations: no control is presented that applies an alternative variable-horizon rule (e.g., random lengths within the same range, or lengths chosen by a different statistic of the chunk) and shows that the performance lift disappears. Without this, it remains possible that any non-constant horizon would produce similar gains, undermining the claim that the phase-aware speed-profile rule is responsible for the 6.4 pp and 19.7 pp improvements.
[Results] Results: the reported averages (57.8 % → 64.2 % on 50 tasks; 50.7 % → 70.4 % real-robot) are given without per-task or aggregate error bars, trial counts, or statistical significance tests. Because the central claim is a quantitative improvement whose magnitude is task-dependent and non-monotonic, the absence of these statistics leaves the reliability of the gains unassessable.

minor comments (2)

[Abstract / Results] The abstract and results text use both 'task score' and 'success rate' without an explicit definition or mapping between the two metrics.
[Figures] Figure captions for rollout analyses should state the number of trajectories visualized and whether the shown speed profiles are from successful or failed rollouts.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the potential significance of PACE for existing chunking policies. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Method] Method section: the exact detection rule for 'low-speed transition points' (speed threshold, local-minima criterion, derivative sign change, or smoothing parameters) is never stated. This is load-bearing for the central claim, because the reported gains are attributed specifically to these points marking appropriate replanning boundaries rather than to any variable-horizon schedule.

Authors: We agree that an explicit algorithmic description is necessary for reproducibility and to substantiate the central claim. The revised manuscript will add a dedicated subsection detailing the low-speed transition point detector, including the precise speed threshold, local-minima criterion, derivative conditions, and any smoothing or filtering parameters. revision: yes
Referee: [Experiments / Ablations] Experiments / Ablations: no control is presented that applies an alternative variable-horizon rule (e.g., random lengths within the same range, or lengths chosen by a different statistic of the chunk) and shows that the performance lift disappears. Without this, it remains possible that any non-constant horizon would produce similar gains, undermining the claim that the phase-aware speed-profile rule is responsible for the 6.4 pp and 19.7 pp improvements.

Authors: Our existing ablations already demonstrate that PACE produces phase-dependent horizon adaptation (shortening near transitions) that fixed horizons cannot match, and that success is non-monotonic with constant horizons. Nevertheless, we acknowledge that an explicit random or alternative-statistic variable-horizon control would more directly isolate the contribution of the speed-profile rule. We will add this comparison in the revised experiments section. revision: yes
Referee: [Results] Results: the reported averages (57.8 % → 64.2 % on 50 tasks; 50.7 % → 70.4 % real-robot) are given without per-task or aggregate error bars, trial counts, or statistical significance tests. Because the central claim is a quantitative improvement whose magnitude is task-dependent and non-monotonic, the absence of these statistics leaves the reliability of the gains unassessable.

Authors: We agree that error bars, trial counts, and significance testing would strengthen the quantitative claims. The revised manuscript will report per-task and aggregate standard deviations (where multiple trials per task exist), explicit trial counts, and appropriate statistical tests for the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper proposes PACE as a heuristic, training-free method that selects variable execution horizons by detecting low-speed points in the open-loop predicted action chunk's speed profile. No equations, fitted parameters, self-citations, or uniqueness theorems are invoked that would reduce the reported empirical gains (e.g., 57.8% to 64.2% success) to quantities defined by the method itself. Performance is measured on external benchmarks (RoboTwin2.0 tasks and real-robot platforms) independent of the heuristic's definition, so the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based on abstract only; full paper unavailable so ledger is minimal. The central claim rests on one domain assumption about trajectory structure.

axioms (1)

domain assumption Manipulation trajectories exhibit phase-dependent kinematic structure identifiable via low-speed points in predicted speed profiles.
Directly invoked when PACE identifies transition points from the predicted chunk.

pith-pipeline@v0.9.1-grok · 5863 in / 1271 out tokens · 21873 ms · 2026-06-28T18:53:31.601071+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 26 canonical work pages · 20 internal anchors

[1]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black et al. π0: A Vision-Language-Action Flow Model for General Robot Control. 2024.doi: 10.48550/ arXiv.2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan et al.RT-1: Robotics Transformer for Real-World Control at Scale. 2022.doi: 10 . 48550 / arXiv.2212.06817

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan et al.RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. 2023.doi:10.48550/arXiv.2307.15818

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15818 2023
[5]

2025.doi: 10.48550/arXiv.2506

Jun Cen et al.WorldVLA: Towards Autoregressive Ac- tion World Model. 2025.doi: 10.48550/arXiv.2506. 21539

work page doi:10.48550/arxiv.2506 2025
[6]

Tianxing Chen et al.RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Random- ization for Robust Bimanual Robotic Manipulation. 2025. doi:10.48550/arXiv.2506.18088

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.18088 2025
[7]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Cheng Chi et al.Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. 2023.doi: 10 . 48550 / arXiv.2303.04137

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

GPT-4 technical report, 2023

Danny Driess et al.PaLM-E: An Embodied Multimodal Language Model. 2023.doi: 10.48550/arXiv.2303. 03378

work page doi:10.48550/arxiv.2303 2023
[9]

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Zipeng Fu et al.Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleop- eration. 2024.doi:10.48550/arXiv.2401.02117

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.02117 2024
[10]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Physical Intelligence et al. π0.5: A Vision-Language- Action Model with Open-World Generalization. 2025.doi: 10.48550/arXiv.2504.16054

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.16054 2025
[11]

Mixture of Horizons in Action Chunking

Dong Jing et al.Mixture of Horizons in Action Chunking. 2025.doi:10.48550/arXiv.2511.19433

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.19433 2025
[12]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Alexander Khazatsky et al.DROID: A Large-Scale In- The-Wild Robot Manipulation Dataset. 2024.doi: 10 . 48550/arXiv.2403.12945

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Moo Jin Kim et al.Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success. 2025.doi: 10. 48550/arXiv.2502.19645

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Towards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks

Oliver Kroemer et al. “Towards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks”. In:2015 IEEE International Conference on Robotics and Automa- tion (ICRA). 2015.doi: 10.1109/ICRA.2015.7139389

work page doi:10.1109/icra.2015.7139389 2015
[16]

Autonomous Framework for Segmenting Robot Trajectories of Ma- nipulation Task

Sang Hyoung Lee, Il Hong Suh, et al. “Autonomous Framework for Segmenting Robot Trajectories of Ma- nipulation Task”. In: (2015).doi: 10 . 1007 / s10514 - 014-9397-9

2015
[17]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu et al.RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation. 2024.doi: 10.48550/ arXiv.2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration et al.Open X- Embodiment: Robotic Learning Datasets and RT-X Mod- els. 2023.doi:10.48550/arXiv.2310.08864

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08864 2023
[19]

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Karl Pertsch et al.FAST: Efficient Action Tokenization for Vision-Language-Action Models. 2025.doi: 10.48550/ arXiv.2501.09747

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Mustafa Shukor et al.SmolVLA: A Vision-Language- Action Model for Affordable and Efficient Robotics. 2025. doi:10.48550/arXiv.2506.01844

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.01844 2025
[21]

2024.doi: 10.48550/arXiv.2405

Octo Model Team et al.Octo: An Open-Source Gener- alist Robot Policy. 2024.doi: 10.48550/arXiv.2405. 12213

work page doi:10.48550/arxiv.2405 2024
[22]

2023.doi: 10.48550/arXiv.2308

Homer Walke et al.BridgeData V2: A Dataset for Robot Learning at Scale. 2023.doi: 10.48550/arXiv.2308. 12952

work page doi:10.48550/arxiv.2308 2023
[23]

2504.21237

Haoxuan Wang et al.Real-Time Robot Execution with Masked Action Chunking. 2026.doi: 10.48550/arXiv. 2601.20130

work page internal anchor Pith review doi:10.48550/arxiv 2026
[24]

VLA Knows Its Limits: Adaptive Execution Horizons for Robot Policies

Haoxuan Wang et al.VLA Knows Its Limits. 2026.doi: 10.48550/arXiv.2602.21445

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.21445 2026
[25]

2025.doi: 10

Adina Yakefu et al.RoboChallenge: Large-scale Real- robot Evaluation of Embodied Policies. 2025.doi: 10. 48550/arXiv.2510.17950

work page arXiv 2025
[26]

World Action Models are Zero-shot Policies

Seonghyeon Ye et al.World Action Models are Zero-shot Policies. 2026.doi:10.48550/arXiv.2602.15922

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.15922 2026
[27]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze et al.3D Diffusion Policy: Generalizable Vi- suomotor Policy Learning via Simple 3D Representations. 2024.doi:10.48550/arXiv.2403.03954

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2403.03954 2024
[28]

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang et al.JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy. 2026.doi: 10 . 48550 / arXiv . 2604.20100

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao et al.Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. 2023.doi: 10. 48550/arXiv.2304.13705. 10 A PACE ImplementationNotes PACE is applied only at test time and does not modify or retrain the base policy. After each policy query, it analyzes the predicted action chunk and selects how many actions to execute before the next ...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black et al. π0: A Vision-Language-Action Flow Model for General Robot Control. 2024.doi: 10.48550/ arXiv.2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [3]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan et al.RT-1: Robotics Transformer for Real-World Control at Scale. 2022.doi: 10 . 48550 / arXiv.2212.06817

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [4]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan et al.RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. 2023.doi:10.48550/arXiv.2307.15818

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15818 2023

[4] [5]

2025.doi: 10.48550/arXiv.2506

Jun Cen et al.WorldVLA: Towards Autoregressive Ac- tion World Model. 2025.doi: 10.48550/arXiv.2506. 21539

work page doi:10.48550/arxiv.2506 2025

[5] [6]

Tianxing Chen et al.RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Random- ization for Robust Bimanual Robotic Manipulation. 2025. doi:10.48550/arXiv.2506.18088

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.18088 2025

[6] [7]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Cheng Chi et al.Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. 2023.doi: 10 . 48550 / arXiv.2303.04137

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [8]

GPT-4 technical report, 2023

Danny Driess et al.PaLM-E: An Embodied Multimodal Language Model. 2023.doi: 10.48550/arXiv.2303. 03378

work page doi:10.48550/arxiv.2303 2023

[8] [9]

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Zipeng Fu et al.Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleop- eration. 2024.doi:10.48550/arXiv.2401.02117

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.02117 2024

[9] [10]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Physical Intelligence et al. π0.5: A Vision-Language- Action Model with Open-World Generalization. 2025.doi: 10.48550/arXiv.2504.16054

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.16054 2025

[10] [11]

Mixture of Horizons in Action Chunking

Dong Jing et al.Mixture of Horizons in Action Chunking. 2025.doi:10.48550/arXiv.2511.19433

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.19433 2025

[11] [12]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Alexander Khazatsky et al.DROID: A Large-Scale In- The-Wild Robot Manipulation Dataset. 2024.doi: 10 . 48550/arXiv.2403.12945

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [13]

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Moo Jin Kim et al.Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success. 2025.doi: 10. 48550/arXiv.2502.19645

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [15]

Towards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks

Oliver Kroemer et al. “Towards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks”. In:2015 IEEE International Conference on Robotics and Automa- tion (ICRA). 2015.doi: 10.1109/ICRA.2015.7139389

work page doi:10.1109/icra.2015.7139389 2015

[14] [16]

Autonomous Framework for Segmenting Robot Trajectories of Ma- nipulation Task

Sang Hyoung Lee, Il Hong Suh, et al. “Autonomous Framework for Segmenting Robot Trajectories of Ma- nipulation Task”. In: (2015).doi: 10 . 1007 / s10514 - 014-9397-9

2015

[15] [17]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Songming Liu et al.RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation. 2024.doi: 10.48550/ arXiv.2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [18]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration et al.Open X- Embodiment: Robotic Learning Datasets and RT-X Mod- els. 2023.doi:10.48550/arXiv.2310.08864

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08864 2023

[17] [19]

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Karl Pertsch et al.FAST: Efficient Action Tokenization for Vision-Language-Action Models. 2025.doi: 10.48550/ arXiv.2501.09747

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [20]

Mustafa Shukor et al.SmolVLA: A Vision-Language- Action Model for Affordable and Efficient Robotics. 2025. doi:10.48550/arXiv.2506.01844

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.01844 2025

[19] [21]

2024.doi: 10.48550/arXiv.2405

Octo Model Team et al.Octo: An Open-Source Gener- alist Robot Policy. 2024.doi: 10.48550/arXiv.2405. 12213

work page doi:10.48550/arxiv.2405 2024

[20] [22]

2023.doi: 10.48550/arXiv.2308

Homer Walke et al.BridgeData V2: A Dataset for Robot Learning at Scale. 2023.doi: 10.48550/arXiv.2308. 12952

work page doi:10.48550/arxiv.2308 2023

[21] [23]

2504.21237

Haoxuan Wang et al.Real-Time Robot Execution with Masked Action Chunking. 2026.doi: 10.48550/arXiv. 2601.20130

work page internal anchor Pith review doi:10.48550/arxiv 2026

[22] [24]

VLA Knows Its Limits: Adaptive Execution Horizons for Robot Policies

Haoxuan Wang et al.VLA Knows Its Limits. 2026.doi: 10.48550/arXiv.2602.21445

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.21445 2026

[23] [25]

2025.doi: 10

Adina Yakefu et al.RoboChallenge: Large-scale Real- robot Evaluation of Embodied Policies. 2025.doi: 10. 48550/arXiv.2510.17950

work page arXiv 2025

[24] [26]

World Action Models are Zero-shot Policies

Seonghyeon Ye et al.World Action Models are Zero-shot Policies. 2026.doi:10.48550/arXiv.2602.15922

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.15922 2026

[25] [27]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze et al.3D Diffusion Policy: Generalizable Vi- suomotor Policy Learning via Simple 3D Representations. 2024.doi:10.48550/arXiv.2403.03954

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2403.03954 2024

[26] [28]

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang et al.JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy. 2026.doi: 10 . 48550 / arXiv . 2604.20100

work page internal anchor Pith review Pith/arXiv arXiv 2026

[27] [29]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao et al.Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. 2023.doi: 10. 48550/arXiv.2304.13705. 10 A PACE ImplementationNotes PACE is applied only at test time and does not modify or retrain the base policy. After each policy query, it analyzes the predicted action chunk and selects how many actions to execute before the next ...

work page internal anchor Pith review Pith/arXiv arXiv 2023