Conformal Orbit-Valid Trust Horizons for Equivariant World Models

Hongbo Wang

arxiv: 2606.24946 · v1 · pith:AODVGI42new · submitted 2026-06-23 · 💻 cs.LG · cs.RO

Conformal Orbit-Valid Trust Horizons for Equivariant World Models

Hongbo Wang This is my paper

Pith reviewed 2026-06-26 00:28 UTC · model grok-4.3

classification 💻 cs.LG cs.RO

keywords conformal predictionequivariant modelsworld modelstrust horizonsgroup orbitslatent dynamicsrollout certificationsymmetry invariance

0 comments

The pith

Equivariant conditions make trust horizons constant over group orbits after conformal calibration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method to certify trust horizons for latent world models that respect known group symmetries. It forms a raw horizon curve from one-step residuals and finite-time estimates, then calibrates it with a split-conformal factor on an audit set. The central result shows that when dynamics, encoder, predictor, action transform, and latent metric satisfy equivariance or invariance, the calibrated curve transports unchanged across the orbit. This matters because one audit sector can then cover all symmetric variants without additional checks. Experiments report zero violations across 50 audits, small transport residuals, and non-vacuous certificates in both 2D and 3D settings.

Core claim

Given a one-step latent residual and a finite-time expansion estimate, a raw horizon curve is formed and calibrated with a split-conformal multiplicative factor. The main structural result is that exact equivariance transports a calibrated trust-horizon curve over the group orbit: when the environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions, rollout errors and trust horizons are orbit-constant. Empirically the implemented models show small orbit-transport residuals and a non-vacuous median certified-to-measured horizon ratio of 0.67.

What carries the argument

The equivariance and invariance conditions on dynamics, encoder, predictor, action transform, and latent metric that make rollout errors and trust horizons orbit-constant.

Load-bearing premise

The environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions.

What would settle it

An orbit point where the measured rollout error exceeds the value transported from the calibration point by more than the observed maximum 4.1 percent residual.

Figures

Figures reproduced from arXiv: 2606.24946 by Hongbo Wang.

**Figure 1.** Figure 1: Certificate Cube. A local split-conformal trust-horizon curve is calibrated on one wedge (γα multiplies the whole error curve, so Hconf ≤ Hmeas; Prop. B). Exact equivariance transports the same calibrated curve over the group orbit, so the orbit-wide coverage event is not a union over g but one calibrated event viewed through the group action (Thm. A, Cor. C). Without equivariance, local calibration stays … view at source ↗

**Figure 2.** Figure 2: The certificate is one-sided conservative and non-vacuous. Left: certified vs. measured horizon over [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Calibration cost depends on substrate geometry (both panels are audit-time evaluations of frozen checkpoints [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Empirical orbit transport (the measured counterpart to Theorem A, demoted from the main text). On a [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Equivariance is a high-variance estimator — the binding downstream cost. Scaling the corpus [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Two edges, two fates. The 2D prediction edge [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Learned world models are useful only over horizons on which their rollout error remains controlled. We study trust-horizon certification for latent world models with known group symmetries. Given a one-step latent residual and a finite-time expansion estimate, we form a raw horizon curve and calibrate it with a split-conformal multiplicative factor. On the reproducible audit set, the conformal factor is $\gamma_\alpha=1.0$: the raw certificate is already conservative under the audit protocol. Across 50 stable audits, we observe zero anti-conservative violations, corresponding to an exact-binomial 95% upper bound of 5.8% on the violation rate. Our main structural result is that exact equivariance transports a calibrated trust-horizon curve over the group orbit: when the environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions, rollout errors and trust horizons are orbit-constant. Empirically, the implemented models exhibit small orbit-transport residuals, with median 1.1% and maximum 4.1% over 14 orbit audits. The certificate is also non-vacuous (median certified-to-measured horizon ratio 0.67). A certificate-level calibration-cost study shows two complementary regimes. On a symmetric 2D substrate, equivariant, plain, and augmented models are all orbit-valid from a single calibration sector -- no separation, because the substrate already makes non-equivariant baselines approximately orbit-robust. A 3D yaw audit shows the other regime: the equivariant model obtains a one-sector safe and non-vacuous orbit-valid certificate, while healthy non-equivariant baselines pay violation, slack, sharpness, or additional-sector cost. The certificate is a conservative, distributional audit rather than a global reachability guarantee, and certificate-guided subgoal spacing is not confirmed in the current 3D CEM-MPC behavior layer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that calibrated trust horizons transport over orbits under exact equivariance, with clean audits and a useful 2D/3D regime split, but the models only approximate the exact conditions the theorem needs.

read the letter

The paper's main result is that when dynamics, encoder, predictor, action transform and latent metric are exactly equivariant, a split-conformal trust horizon calibrated on one sector applies unchanged across the orbit. They combine a raw horizon curve from one-step residuals and finite-time expansion with a multiplicative conformal factor, and on the held-out audit set that factor is already 1.0.

What works is the structural claim itself and the empirical contrast: on the symmetric 2D substrate every model is orbit-valid from one calibration, while in the 3D yaw case only the equivariant model gets a one-sector safe non-vacuous certificate. The audits are reproducible, show zero anti-conservative violations across 50 runs (exact-binomial bound 5.8%), and the median certified-to-measured ratio of 0.67 is non-vacuous. Small orbit-transport residuals (median 1.1%, max 4.1%) are reported transparently.

The soft spot is the distance between the exact-equivariance assumption required for the transport theorem and the approximate satisfaction in the implemented models. Because the certificate is derived under exact transport, any residual means the transported curve can pick up unaccounted error; the paper does not close that gap with extra slack or a robustness margin. The one-step residual and expansion estimate are only sketched, and the certificate remains a distributional audit rather than a planning guarantee.

This is for people working on symmetry-aware world models in control and robotics who need certified rollout lengths. The formal result is precise and the audit work is careful, so it deserves peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to provide conformal calibration for trust horizons in latent world models with group symmetries. Given one-step residuals and finite-time estimates, a raw curve is calibrated with a split-conformal factor. The key result is that exact equivariance of the dynamics, encoder, predictor, action transform, and latent metric makes rollout errors and trust horizons orbit-constant. Empirical results show zero violations in 50 audits, small orbit residuals (median 1.1%), and non-vacuous certificates (median ratio 0.67), with advantages for equivariant models in 3D yaw audits.

Significance. If the central claims hold, the work offers a method to certify trust horizons that are invariant under group actions, potentially lowering calibration costs for symmetric environments. Strengths include the reproducible audit protocol with zero anti-conservative violations and the exact-binomial bound, as well as the non-vacuous median ratio. However, the applicability is limited by the exact equivariance assumption.

major comments (2)

[Abstract] Abstract: The main structural result states that rollout errors and trust horizons are orbit-constant precisely when the environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions. However, the manuscript reports only that implemented models exhibit small orbit-transport residuals (median 1.1%, max 4.1% over 14 audits) rather than exact satisfaction. Since the conformal calibration and orbit-valid certificate are derived under the exact-transport theorem, deviations from exact equivariance may prevent the calibrated curve from transporting without additional error.
[Abstract] Abstract: The one-step residual, finite-time expansion estimate, and audit protocol details are only sketched; this limits assessment of the soundness of the 50 audits and the exact-binomial bound of 5.8% on violation rate.

minor comments (1)

The abstract mentions 'the certificate is a conservative, distributional audit rather than a global reachability guarantee'; this disclaimer is appropriate but could be expanded in the main text for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The main structural result states that rollout errors and trust horizons are orbit-constant precisely when the environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions. However, the manuscript reports only that implemented models exhibit small orbit-transport residuals (median 1.1%, max 4.1% over 14 audits) rather than exact satisfaction. Since the conformal calibration and orbit-valid certificate are derived under the exact-transport theorem, deviations from exact equivariance may prevent the calibrated curve from transporting without additional error.

Authors: The theorem is stated and proved under exact equivariance of the listed components; the empirical residuals are reported precisely to document the gap between theory and the trained models. We agree that the effect of these small deviations on transported certificates should be analyzed explicitly rather than left implicit. In revision we will insert a short paragraph after the theorem statement that (a) recalls the exact-transport assumption and (b) provides a first-order sensitivity argument showing that residuals below 5 % induce at most a small additive inflation of the conformal factor, which remains conservative under the audit protocol. revision: partial
Referee: [Abstract] Abstract: The one-step residual, finite-time expansion estimate, and audit protocol details are only sketched; this limits assessment of the soundness of the 50 audits and the exact-binomial bound of 5.8% on violation rate.

Authors: The abstract is intentionally concise, but the full manuscript (Sections 3–4 and Appendix B) supplies the exact definitions, the finite-time expansion formula, the split-conformal procedure, and the audit protocol that yields the exact-binomial bound. To address the concern we will (i) add one clarifying sentence to the abstract that names the audit protocol and (ii) ensure the supplementary code release contains the complete audit scripts so that the 50-audit results and the 5.8 % bound can be reproduced directly. revision: yes

Circularity Check

0 steps flagged

No circularity: structural result follows from explicit equivariance assumptions; calibration uses held-out set

full rationale

The main structural result asserts orbit-constancy of rollout errors and trust horizons precisely when the listed components satisfy the paper's stated equivariance/invariance conditions. This is a direct implication of the assumptions rather than a reduction by construction. The conformal factor is obtained via split-conformal calibration on a held-out audit set (yielding γ_α=1.0), not fitted to the orbit-transport claim. Empirical residuals (median 1.1%, max 4.1%) are reported as separate observations. No self-citation chains, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation appear as load-bearing steps. The derivation remains self-contained against the stated conditions and external audit protocol.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard conformal prediction machinery plus domain assumptions about equivariance; no new entities are postulated and the single calibrated factor is data-driven rather than ad-hoc.

free parameters (1)

conformal multiplicative factor γ_α = 1.0
Calibrated on the split audit set; reported value 1.0 makes the raw horizon curve already conservative under the protocol.

axioms (1)

domain assumption Environment dynamics, encoder, predictor, action transform, and latent metric satisfy the stated equivariance/invariance conditions
Required for the structural result that rollout errors and trust horizons are orbit-constant.

pith-pipeline@v0.9.1-grok · 5870 in / 1258 out tokens · 27841 ms · 2026-06-26T00:28:07.201054+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 4 linked inside Pith

[1]

V-JEPA: Latent video prediction for visual representation learning.arXiv preprint arXiv:2404.08471,

Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, and Nicolas Ballas. V-JEPA: Latent video prediction for visual representation learning.arXiv preprint arXiv:2404.08471,

Pith/arXiv arXiv
[2]

Does equivariance matter at scale?arXiv preprint arXiv:2410.23179,

Johann Brehmer, Sönke Behrends, Pim de Haan, and Taco Cohen. Does equivariance matter at scale?arXiv preprint arXiv:2410.23179,

arXiv
[3]

Bryn Elesedy and Sheheryar Zaidi

arXiv:2104.12229. Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. InInternational Conference on Machine Learning (ICML),

arXiv
[4]

Mario Geiger and Tess Smidt

arXiv:2102.10333. Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks.arXiv preprint arXiv:2207.09453,

arXiv
[5]

Deterministic world model for closed-loop verification of end-to-end vision-based controller.arXiv preprint arXiv:2512.08991,

Yuang Geng, Zhuoyang Zhou, Zhongzheng Zhang, Siyuan Pan, Hoang-Dung Tran, and Ivan Ruchkin. Deterministic world model for closed-loop verification of end-to-end vision-based controller.arXiv preprint arXiv:2512.08991,

arXiv
[6]

Mastering diverse domains through world models (DreamerV3).arXiv preprint arXiv:2301.04104,

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models (DreamerV3).arXiv preprint arXiv:2301.04104,

Pith/arXiv arXiv
[7]

FF-JEPA: Long-horizon planning in world models with latent planners.arXiv preprint arXiv:2606.09311,

Sergi Masip, Jonathan Swinnen, Yutong Hu, Renaud Detry, and Tinne Tuytelaars. FF-JEPA: Long-horizon planning in world models with latent planners.arXiv preprint arXiv:2606.09311,

Pith/arXiv arXiv
[8]

Symmetry-protected lyapunov neutral modes in equivariant recurrent networks.arXiv preprint arXiv:2605.03338,

9 Hanson Hanxuan Mo. Symmetry-protected lyapunov neutral modes in equivariant recurrent networks.arXiv preprint arXiv:2605.03338,

Pith/arXiv arXiv
[9]

Yaniv Romano, Evan Patterson, and Emmanuel J

arXiv:2408.14336. Yaniv Romano, Evan Patterson, and Emmanuel J. Candès. Conformalized quantile regression. InAdvances in Neural Information Processing Systems (NeurIPS),

arXiv
[10]

Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, and Jeannette Bohg

arXiv:2203.04439. Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, and Jeannette Bohg. EquiBot: SIM(3)- equivariant diffusion policy for generalizable and data efficient learning. InConference on Robot Learning (CoRL),

arXiv
[11]

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, and Nicolas Ballas

arXiv:2407.01479. Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, and Nicolas Ballas. Hierarchical planning with latent world models.arXiv preprint arXiv:2604.03208,

arXiv
[12]

rests on the well-powered variance finding, not the correlation test. For completeness: across n=10 encoders at 400 episodes the per-seed optimal horizon spreads over H∈ {2,3,4,8} with 3/10 failing at every H; the formal trust-horizon / best-horizon association is Kendall τb =−0.15 (permutation p=0.60 ), which is underpowered (simulated power <0.4 for ρ≤0...

2024

[1] [1]

V-JEPA: Latent video prediction for visual representation learning.arXiv preprint arXiv:2404.08471,

Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, and Nicolas Ballas. V-JEPA: Latent video prediction for visual representation learning.arXiv preprint arXiv:2404.08471,

Pith/arXiv arXiv

[2] [2]

Does equivariance matter at scale?arXiv preprint arXiv:2410.23179,

Johann Brehmer, Sönke Behrends, Pim de Haan, and Taco Cohen. Does equivariance matter at scale?arXiv preprint arXiv:2410.23179,

arXiv

[3] [3]

Bryn Elesedy and Sheheryar Zaidi

arXiv:2104.12229. Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. InInternational Conference on Machine Learning (ICML),

arXiv

[4] [4]

Mario Geiger and Tess Smidt

arXiv:2102.10333. Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks.arXiv preprint arXiv:2207.09453,

arXiv

[5] [5]

Deterministic world model for closed-loop verification of end-to-end vision-based controller.arXiv preprint arXiv:2512.08991,

Yuang Geng, Zhuoyang Zhou, Zhongzheng Zhang, Siyuan Pan, Hoang-Dung Tran, and Ivan Ruchkin. Deterministic world model for closed-loop verification of end-to-end vision-based controller.arXiv preprint arXiv:2512.08991,

arXiv

[6] [6]

Mastering diverse domains through world models (DreamerV3).arXiv preprint arXiv:2301.04104,

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models (DreamerV3).arXiv preprint arXiv:2301.04104,

Pith/arXiv arXiv

[7] [7]

FF-JEPA: Long-horizon planning in world models with latent planners.arXiv preprint arXiv:2606.09311,

Sergi Masip, Jonathan Swinnen, Yutong Hu, Renaud Detry, and Tinne Tuytelaars. FF-JEPA: Long-horizon planning in world models with latent planners.arXiv preprint arXiv:2606.09311,

Pith/arXiv arXiv

[8] [8]

Symmetry-protected lyapunov neutral modes in equivariant recurrent networks.arXiv preprint arXiv:2605.03338,

9 Hanson Hanxuan Mo. Symmetry-protected lyapunov neutral modes in equivariant recurrent networks.arXiv preprint arXiv:2605.03338,

Pith/arXiv arXiv

[9] [9]

Yaniv Romano, Evan Patterson, and Emmanuel J

arXiv:2408.14336. Yaniv Romano, Evan Patterson, and Emmanuel J. Candès. Conformalized quantile regression. InAdvances in Neural Information Processing Systems (NeurIPS),

arXiv

[10] [10]

Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, and Jeannette Bohg

arXiv:2203.04439. Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, and Jeannette Bohg. EquiBot: SIM(3)- equivariant diffusion policy for generalizable and data efficient learning. InConference on Robot Learning (CoRL),

arXiv

[11] [11]

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, and Nicolas Ballas

arXiv:2407.01479. Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, and Nicolas Ballas. Hierarchical planning with latent world models.arXiv preprint arXiv:2604.03208,

arXiv

[12] [12]

rests on the well-powered variance finding, not the correlation test. For completeness: across n=10 encoders at 400 episodes the per-seed optimal horizon spreads over H∈ {2,3,4,8} with 3/10 failing at every H; the formal trust-horizon / best-horizon association is Kendall τb =−0.15 (permutation p=0.60 ), which is underpowered (simulated power <0.4 for ρ≤0...

2024