pith. sign in

arxiv: 2606.32026 · v1 · pith:AB7AJXV3new · submitted 2026-06-30 · 💻 cs.LG · cs.AI

AdaJEPA: An Adaptive Latent World Model

Pith reviewed 2026-07-01 05:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords latent world modelstest-time adaptationmodel predictive controlself-supervised learninggoal-reaching tasksadaptive planningclosed-loop control
0
0 comments X

The pith

AdaJEPA adapts a latent world model at test time inside model predictive control by treating observed next states as self-supervised signals for recalibration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Latent world models support planning from high-dimensional observations by forecasting in a compact space, yet they are typically held fixed after training and degrade when real conditions diverge from training data. AdaJEPA instead executes the first chunk of a planned action sequence, measures the actual next state, and performs a quick update to the model before replanning. The update uses only the observed transition and requires no extra demonstrations. This loop runs with as little as one gradient step per replanning cycle and raises success rates on goal-reaching tasks. Readers would care because the method turns execution feedback into an ongoing correction process rather than relying on a static model.

Core claim

After training, AdaJEPA plans and executes the first action chunk, uses the observed next-state transition as a self-supervised adaptation signal, and replans with the updated model. This closed-loop update continuously recalibrates the world model without additional expert demonstrations. Across a range of goal-reaching tasks, AdaJEPA substantially improves planning success with as few as one gradient step per MPC replanning step.

What carries the argument

The closed-loop test-time adaptation step that recalibrates the latent world model from the observed next-state transition after each executed action chunk.

If this is right

  • Planning success on goal-reaching tasks rises when the model is allowed to update from real observations inside the control loop.
  • Effective recalibration occurs with only one gradient step per MPC replanning cycle.
  • No extra expert demonstrations are required to maintain model accuracy at test time.
  • The approach keeps the world model usable under test-time distribution shift without freezing parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-supervised update loop could be applied to other online control settings where a predictive model drifts over time.
  • If the adaptation step proves stable across longer horizons, it might reduce the need for exhaustive offline data collection in robotics.
  • Noise in the observed transition could limit reliability unless the method incorporates filtering or selective updates.
  • Extending the single-step update to multi-step chunks might trade off speed for greater correction per cycle.

Load-bearing premise

The observed next-state transition after executing the first action chunk supplies a reliable self-supervised signal that permits useful model updates without causing instability.

What would settle it

Run the same goal-reaching tasks with distribution shift; if planning success rates remain unchanged or drop after the one-gradient-step updates, the adaptation benefit is absent.

Figures

Figures reproduced from arXiv: 2606.32026 by Mengye Ren, Oumayma Bounou, Yann LeCun, Ying Wang.

Figure 1
Figure 1. Figure 1: AdaJEPA performs test-time adaptation during closed-loop MPC. At each MPC step, we plan with the current model, execute the first action 𝑎𝑡 , collect observation 𝑜𝑡+1 from the environment, and update the model to minimize the prediction error on the newly observed transition {𝑜𝑡 , 𝑎𝑡 , 𝑜𝑡+1} before replanning. This yields a simple plan–execute–adapt–replan loop that continually recalibrates the model to tr… view at source ↗
Figure 2
Figure 2. Figure 2: Planning Success under Shape Shifts (top) and Visual Shifts (bottom). The ★ denotes unseen shapes and configurations. AdaJEPA consistently improves planning success across all settings, using only a single adaptation step per MPC replanning step. We extend the maximum number of steps to 30 to show the increas￾ing trend of planning success of AdaJEPA. Comparison between frozen and AdaJEPA planning trajector… view at source ↗
Figure 4
Figure 4. Figure 4: PointMaze-Medium Dynamics-Shift Planning Trajectories. The green polylines trace the agent’s position over time, the blue square marks the end, and the gold star marks the goal. Under in-distribution dynamics the model reaches the goal. However, frozen world model mispredicts and planning fails under dynamics shifts, while test-time adaptation realigns with the new environment and recovers success. (a1) Ma… view at source ↗
Figure 5
Figure 5. Figure 5: Diverse Maze Planning Trajectories. We use the same visual conventions as in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of AdaJEPA Planning Trajectories under Visual Shifts and Shape Shifts. The decoder is trained on pretrained representations; i.e. (a) original PushT data with a gray pushed block; (b) PushObj data which only includes {T, L, Z, +} while square is an unseen shape. compared with 45.8% when all trajectories come from a single shape (𝐾=1, 𝑁 =16k). Test-time adaptation improves success rates across scal… view at source ↗
Figure 6
Figure 6. Figure 6: Effect of training data scale on PushObj planning suc￾cess: shape diversity 𝐾 and trajectories per shape 𝑁. However, as we only perform lightweight correction during planning, its effective￾ness is also bounded by the coverage of the pretrained representation: when the test environment requires features absent from training, adaptation can im￾prove planning but may not fully close the gap. A natural next s… view at source ↗
Figure 8
Figure 8. Figure 8: Effect of Adaptation Layers to Planning Success Rates. The reported values are the per-shift success rates (%) averaged over all setups within each shift. Test-time adaptation improves planning across all distri￾bution shifts and is largely insensitive to which layers are adapted. B Experiments B.1 Hyperparameters [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of Adaptation Hyperparameters and Replay Buffers to Planning Success Rates. What to adapt. In general, AdaJEPA is robust to the choice of adaptation target. As shown in [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Heatmap of offline training data scale vs. PushObj planning success. Rows vary shape diversity 𝐾 and columns vary trajectories per shape 𝑁. substantially outperforms the frozen model. We use a recent sliding-window buffer in the main experi￾ments, as it provides the most stable gains. B.3 Additional Experiments: Training Data Scale for PushObj We study how training-data size and diversity affect test-time… view at source ↗
Figure 11
Figure 11. Figure 11: The model is trained on shape {T,L,+,Z}, and tested on a seen shape + here [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Shape Shifts: The model is trained on shapes {T, L, Z, +}, and tested on an unseen shape smallT. The decoder tries to decode to the closest seen shapes. Page 18 of 19 [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visual Shifts: The model is trained on the default PushT data, but the test-time observations have salt-and-pepper noise. AdaJEPA consistently decreases prediction loss [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visual Shifts: The model is trained on the default PushT data (agent is blue), but the agent is red at test time. AdaJEPA consistently decreases prediction loss. Page 19 of 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
read the original abstract

Latent world models enable planning from high-dimensional observations by predicting future states in a compact latent space. However, these models are typically kept frozen at test time: when their predictions become inaccurate, planning can fail, especially under test-time distribution shift. To address this, we propose AdaJEPA, an adaptive latent world model that performs test-time adaptation within the closed loop of model predictive control (MPC). After training, AdaJEPA plans and executes the first action chunk, uses the observed next-state transition as a self-supervised adaptation signal, and replans with the updated model. This closed-loop update continuously recalibrates the world model without additional expert demonstrations. Across a range of goal-reaching tasks, AdaJEPA substantially improves planning success with as few as one gradient step per MPC replanning step.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes AdaJEPA, a latent world model that performs test-time adaptation inside the MPC loop: after executing the first action chunk it treats the observed next-state transition as a self-supervised signal, performs a single gradient step on the world model, and replans with the updated parameters. The central claim is that this closed-loop recalibration yields substantially higher planning success on goal-reaching tasks under distribution shift, without extra expert demonstrations.

Significance. If the empirical gains are reproducible and statistically robust, the approach would be a lightweight, practical extension of existing latent-world-model + MPC pipelines that directly mitigates test-time model mismatch. The restriction to a single gradient step per replan is attractive for real-time settings.

major comments (1)
  1. Abstract: the assertion that AdaJEPA 'substantially improves planning success' is presented without any quantitative results, success rates, baselines, error bars, or experimental protocol. Because the paper's contribution is framed as an empirical improvement, this omission is load-bearing for the central claim and prevents assessment of effect size or statistical reliability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the feedback. We agree that the abstract's central empirical claim requires quantitative support to be properly assessed, and we will revise it to include key results.

read point-by-point responses
  1. Referee: [—] Abstract: the assertion that AdaJEPA 'substantially improves planning success' is presented without any quantitative results, success rates, baselines, error bars, or experimental protocol. Because the paper's contribution is framed as an empirical improvement, this omission is load-bearing for the central claim and prevents assessment of effect size or statistical reliability.

    Authors: We accept this point. The current abstract states the improvement only qualitatively. In the revision we will add concrete numbers drawn from the experimental section (e.g., success rates on the goal-reaching tasks, comparison to the frozen JEPA baseline, and reference to the number of gradient steps and statistical variability). This change will make the abstract self-contained for evaluating the claimed effect size. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript describes a procedural test-time adaptation loop for latent world models using observed transitions as self-supervised signals within MPC. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations are present in the provided text that would reduce any claim to its own inputs by construction. The approach is a direct extension of standard latent dynamics + planning pipelines, with empirical claims left open to external validation rather than internal self-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that one gradient step on observed transitions produces stable and useful model updates.

pith-pipeline@v0.9.1-grok · 5667 in / 1212 out tokens · 24727 ms · 2026-07-01T05:59:55.783540+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 9 canonical work pages · 4 internal anchors

  1. [1]

    2025 , journal =

    DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning , author=. 2025 , journal =

  2. [2]

    NeurIPS , year=

    Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models , author=. NeurIPS , year=

  3. [3]

    2022 , journal =

    VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , author=. 2022 , journal =

  4. [4]

    ICML , year=

    Temporal difference learning for model predictive control , author=. ICML , year=

  5. [5]

    ICLR , year=

    Td-mpc2: Scalable, robust world models for continuous control , author=. ICLR , year=

  6. [6]

    2021 , journal=

    D4RL: Datasets for Deep Data-Driven Reinforcement Learning , author=. 2021 , journal=

  7. [7]

    , title =

    Sutton, Richard S. , title =. 1991 , publisher =

  8. [8]

    IFAC Proceedings Volumes , year=

    Self-adapting IDCOM , author=. IFAC Proceedings Volumes , year=

  9. [9]

    2025 , journal=

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning , author=. 2025 , journal=

  10. [10]

    CVPR , year =

    Xinlei Chen and Kaiming He , title =. CVPR , year =

  11. [11]

    IJRR , year =

    Cheng Chi and Zhenjia Xu and Siyuan Feng and Eric Cousineau and Yilun Du and Benjamin Burchfiel and Russ Tedrake and Shuran Song , title =. IJRR , year =

  12. [12]

    LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

    LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics , author=. arXiv preprint arXiv:2511.08544 , year=

  13. [13]

    World models can leverage human videos for dexterous manipulation.arXiv preprint arXiv:2512.13644,

    World Models Can Leverage Human Videos for Dexterous Manipulation , author=. arXiv preprint arXiv:2512.13644 , year=

  14. [14]

    2022 , url=

    A path towards autonomous machine intelligence , author=. 2022 , url=

  15. [15]

    European Journal of Operational Research , year=

    Optimization of computer simulation models with rare events , author=. European Journal of Operational Research , year=

  16. [16]

    ICML , year=

    Temporal Straightening for Latent Planning , author=. ICML , year=

  17. [17]

    ICML , year =

    Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , author=. ICML , year =

  18. [18]

    2021 , journal =

    Tent: Fully Test-time Adaptation by Entropy Minimization , author=. 2021 , journal =

  19. [19]

    2022 , journal =

    Memo: Test time robustness via adaptation and augmentation , author=. 2022 , journal =

  20. [20]

    2022 , journal =

    Efficient test-time model adaptation without forgetting , author=. 2022 , journal =

  21. [21]

    CVPR , year =

    Continual test-time domain adaptation , author=. CVPR , year =

  22. [22]

    arXiv preprint arXiv:2302.12400 (2023)

    Towards Stable Test-Time Adaptation in Dynamic Wild World , author=. arXiv preprint arXiv:2302.12400 , year =

  23. [23]

    NeurIPS , year =

    Test-time training with masked autoencoders , author=. NeurIPS , year =

  24. [24]

    ICLR , year=

    C-tpt: Calibrated test-time prompt tuning for vision-language models via text feature dispersion , author=. ICLR , year=

  25. [25]

    JMLR , year=

    Test-time training on video streams , author=. JMLR , year=

  26. [26]

    ICML , year =

    The Surprising Effectiveness of Test-Time Training for Few-Shot Learning , author=. ICML , year =

  27. [27]

    ICLR , year =

    AdaWM: Adaptive World Model based Planning for Autonomous Driving , author=. ICLR , year =

  28. [28]

    ICML , year =

    AdaWorld: Learning Adaptable World Models with Latent Actions , author=. ICML , year =

  29. [29]

    arXiv preprint arXiv:2504.02252 , year =

    Adapting World Models with Latent-State Dynamics Residuals , author=. arXiv preprint arXiv:2504.02252 , year =

  30. [30]

    arXiv preprint arXiv:2512.09929 , year=

    Closing the Train-Test Gap in World Models for Gradient-Based Planning , author=. arXiv preprint arXiv:2512.09929 , year=

  31. [31]

    Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

    Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs , author=. arXiv preprint arXiv:2602.21198 , year =

  32. [32]

    NeurIPS , year=

    Planning with an adaptive world model , author=. NeurIPS , year=

  33. [33]

    2026 , journal=

    Self-Improving Loops for Visual Robotic Planning , author=. 2026 , journal=

  34. [34]

    2016 , publisher=

    Model Predictive Control: Classical, Robust and Stochastic , author=. 2016 , publisher=

  35. [35]

    1989 , author =

    Model predictive control: Theory and practice—A survey , journal =. 1989 , author =

  36. [36]

    Trends in cognitive sciences , year=

    Internal models in the cerebellum , author=. Trends in cognitive sciences , year=

  37. [37]

    Reza Shadmehr and F. A. Mussa-Ivaldi , year=. Adaptive representation of dynamics during learning of a motor task , journal=

  38. [38]

    Annual review of neuroscience , year=

    Error correction, sensory prediction, and adaptation in motor control , author=. Annual review of neuroscience , year=

  39. [39]

    Current opinion in neurobiology , year=

    Learning to predict the future: the cerebellum adapts feedforward movement control , author=. Current opinion in neurobiology , year=

  40. [40]

    Craik, Kenneth J. W. , title =. 1943 , publisher =

  41. [41]

    Neuron , year=

    Model-based influences on humans' choices and striatal prediction errors , author=. Neuron , year=

  42. [42]

    Neuron , year=

    States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning , author=. Neuron , year=

  43. [43]

    Journal of Neuroscience , year=

    An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment , author=. Journal of Neuroscience , year=

  44. [44]

    LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

    Leworldmodel: Stable end-to-end joint-embedding predictive architecture from pixels , author=. arXiv preprint arXiv:2603.19312 , year=

  45. [45]

    Hierarchical Planning with Latent World Models

    Hierarchical planning with latent world models , author=. arXiv preprint arXiv:2604.03208 , year=

  46. [46]

    2026 , howpublished=

    Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models , author=. arXiv preprint arXiv:2602.18639 , year=