Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation

Aishani Pathak; Hasti Seifi

arxiv: 2605.00233 · v1 · submitted 2026-04-30 · 💻 cs.CV

Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation

Aishani Pathak , Hasti Seifi This is my paper

Pith reviewed 2026-05-09 19:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords egocentric pose estimationconformal predictionadaptive conformal predictiongeodesic distancecamera pose uncertaintyconditional coverageDINOv2 features

0 comments

The pith

Adaptive conformal prediction using a transferable difficulty estimator raises coverage for the hardest egocentric camera poses from 75% to 93% while holding overall coverage at the 90% target.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ordinary conformal prediction for egocentric camera pose leaves the hardest quarter of frames with only about 60% coverage even when the overall guarantee is set to 90%. It demonstrates that a geodesic SE(3) nonconformity score flags physically tougher frames better than Euclidean distance, with those frames showing two to three times larger ground-truth camera movement. To fix the gap, the authors introduce a two-stage difficulty estimator called DINOv2-Bridge, trained once on one participant, that adapts the prediction threshold for each new frame without needing any images from the test participant. This adaptation lifts coverage on the hardest frames to 93% across 12 participants and multiple predictors while preserving the nominal marginal coverage.

Core claim

Standard fixed-threshold conformal prediction achieves nominal 90% coverage but only ~60% coverage on the hardest 25% of frames (Q4), a gap that persists across 12 participants, 3 predictors, and 3 horizons. A geodesic SE(3) nonconformity score identifies harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth displacement. DINOv2-Bridge adaptive CP, a two-stage difficulty estimator trained on a single source participant, transfers cross-participant without test images and raises Q4 coverage from ~0.75 to ~0.93 while keeping overall coverage at the 90% target.

What carries the argument

DINOv2-Bridge adaptive CP, a two-stage difficulty estimator trained on one source participant that transfers to new participants without any test images, combined with a geodesic SE(3) nonconformity score that replaces Euclidean distance for ranking frame difficulty.

If this is right

The geodesic SE(3) score consistently flags frames with 2-3 times larger actual camera displacement than Euclidean scoring.
Adaptive threshold adjustment closes the 30-percentage-point conditional coverage gap without retraining the underlying pose predictor.
Overall 90% marginal coverage is preserved across 108 evaluations spanning 12 participants, 3 predictors, and 3 horizons.
The method works with any base pose estimator and requires no images from the target user at deployment time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could support reliable uncertainty bounds for AR headsets in everyday motion where sudden head turns create the hardest frames.
Because the estimator transfers without test images, it may suit privacy-sensitive settings where raw video cannot be sent for calibration.
The same adaptive logic might extend to other SE(3) tasks such as object tracking or robot navigation where conditional coverage on hard cases matters.
If the difficulty signal generalizes further, it could reduce the need for participant-specific calibration datasets in assistive devices.

Load-bearing premise

A difficulty estimator trained on a single source participant transfers cross-participant without any images at test time and without degrading the marginal coverage guarantee.

What would settle it

On a fresh set of participants or longer prediction horizons, measure whether Q4 coverage falls back below 90% or overall coverage deviates from the nominal 90% target when the DINOv2-Bridge estimator is applied without retraining.

read the original abstract

Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) -- a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Geodesic SE(3) scoring plus single-source DINOv2-Bridge transfer lifts Q4 coverage while keeping marginal at 90%, but the exchangeability assumption under cross-participant application is the part that needs checking.

read the letter

The paper's real contribution is pairing a geodesic nonconformity score on SE(3) with an adaptive threshold that comes from a difficulty estimator trained on one participant and applied to the others with no test images. The geodesic score flags frames with 2-3x higher ground-truth displacement and only 15-26% overlap with the Euclidean hard set, which is a clean empirical observation across the 108 evaluations on EPIC-Fields. The adaptive step then moves Q4 coverage from roughly 0.75 to 0.93 while the overall coverage stays at the 90% target. That pattern is consistent enough to be worth noticing for anyone doing conformal prediction on pose or tracking tasks. The evaluation breadth (12 participants, 3 predictors, 3 horizons) gives the claims some weight even without error bars in the abstract. The main soft spot is exactly the one the stress-test note flags. Training the estimator on a single source and deploying it without visual input at test time risks a distribution shift in the difficulty scores. Conformal prediction's finite-sample guarantee rests on exchangeability between calibration and test nonconformity scores; if the predicted difficulties are no longer exchangeable across participants, the marginal coverage can drift even when the reported average looks fine. The abstract claims the coverage holds, but without the exact nonconformity formula, the quartile construction details, or any statistical test on the 0.75-to-0.93 jump, it's hard to judge how robust the transfer really is. No circularity or invented entities show up in what is described. This is for readers working on uncertainty quantification for egocentric vision or AR tracking who already know conformal prediction and want to see a concrete way to tighten conditional coverage on hard cases. It is not a foundational methods paper, but the empirical pattern is sharp enough that a serious editor should send it to referees so the methods section can be examined directly.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard conformal prediction for egocentric camera pose estimation achieves nominal 90% marginal coverage but only ~60% coverage on the hardest quartile (Q4) of frames, a gap observed consistently across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. It introduces a geodesic SE(3) nonconformity score that identifies physically harder frames than Euclidean scoring (15-26% Q4 overlap, 2-3x higher ground-truth displacement). The proposed DINOv2-Bridge adaptive CP trains a two-stage difficulty estimator on a single source participant and transfers it cross-participant without test-time images, raising Q4 coverage from ~0.75 to ~0.93 while preserving the 90% overall target.

Significance. If the empirical coverage improvements hold and the marginal guarantee is preserved under transfer, this provides a practical advance in guaranteed uncertainty quantification for egocentric pose estimation in AR and assistive devices. The consistent results across 108 evaluations, the demonstration that geodesic scoring better captures physical difficulty, and the no-test-image transfer are empirical strengths that could support more reliable deployment in variable conditions.

major comments (2)

The central claim that adaptive CP maintains exact 90% marginal coverage under cross-participant transfer of a single-source DINOv2-Bridge difficulty estimator (without test images) rests on exchangeability of nonconformity scores. Training on one participant introduces potential distribution shift in difficulty predictions; the manuscript must either provide a theoretical argument showing why the finite-sample guarantee is unaffected or include ablations demonstrating coverage under controlled participant shifts, as this is load-bearing for interpreting the Q4 improvement as valid CP adaptation.
Abstract and results: The reported Q4 coverage lift from ~0.75 to ~0.93 (and the ~0.75-to-0.93 figure) is presented without statistical tests, error bars, or explicit details on quartile definition, nonconformity score implementation, or how the 108 evaluations were aggregated. This weakens assessment of robustness, especially given the reader's note on missing post-hoc details.

minor comments (2)

Clarify the exact form of the geodesic SE(3) nonconformity score (e.g., in the methods section) to distinguish it from other manifold distances and enable reproduction.
The abstract's mention of 'parameter-free' aspects of the geodesic score should be cross-checked against any learned components in the difficulty estimator to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point by point below, with proposed revisions to strengthen the manuscript where the concerns are valid.

read point-by-point responses

Referee: The central claim that adaptive CP maintains exact 90% marginal coverage under cross-participant transfer of a single-source DINOv2-Bridge difficulty estimator (without test images) rests on exchangeability of nonconformity scores. Training on one participant introduces potential distribution shift in difficulty predictions; the manuscript must either provide a theoretical argument showing why the finite-sample guarantee is unaffected or include ablations demonstrating coverage under controlled participant shifts, as this is load-bearing for interpreting the Q4 improvement as valid CP adaptation.

Authors: We agree this is a load-bearing point for the validity of the adaptive procedure. The manuscript reports that marginal coverage remains at the 90% target across all 108 evaluations under transfer, but does not contain an explicit theoretical derivation or controlled-shift ablations. In the revision we will add a new subsection that (1) clarifies the procedure: the difficulty estimator is trained once on the source participant and then used only to select per-frame quantiles on target data whose nonconformity scores are computed directly from the target calibration set, preserving exchangeability of those scores; (2) provides a short argument that the marginal guarantee continues to hold exactly because the adaptation modulates only the quantile index and does not alter the calibration-set exchangeability assumption; and (3) includes new ablation tables that train the estimator on one participant and evaluate coverage on each of the remaining 11 participants individually, reporting both mean and worst-case deviation from 90%. These additions will make the Q4 improvement interpretable as a valid conformal adaptation. revision: yes
Referee: Abstract and results: The reported Q4 coverage lift from ~0.75 to ~0.93 (and the ~0.75-to-0.93 figure) is presented without statistical tests, error bars, or explicit details on quartile definition, nonconformity score implementation, or how the 108 evaluations were aggregated. This weakens assessment of robustness, especially given the reader's note on missing post-hoc details.

Authors: We accept that the current presentation lacks the requested statistical and implementation details. In the revised manuscript we will: (a) add error bars (standard deviation across the 12 participants) to all Q4 coverage plots and tables; (b) report paired Wilcoxon signed-rank p-values comparing standard versus adaptive CP on the Q4 subset; (c) expand the methods section with an explicit definition of the quartiles (sorted geodesic nonconformity scores on the calibration set) and a step-by-step description of the geodesic SE(3) nonconformity score; and (d) include a supplementary table that enumerates the exact aggregation (12 participants × 3 predictors × 3 horizons = 108 independent evaluations) together with per-predictor and per-horizon breakdowns. These changes will directly address the robustness concerns. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with held-out evaluation

full rationale

The paper presents an empirical method for adaptive conformal prediction using a DINOv2-Bridge difficulty estimator trained on one participant and evaluated on the remaining 11 held-out participants in the EPIC-Fields dataset. Coverage metrics (overall 90% target and Q4 improvement) are reported directly from test-set performance across 108 evaluations. No derivation chain, equation, or first-principles result reduces to its inputs by construction; the geodesic SE(3) score and adaptive threshold are proposed and validated experimentally rather than defined circularly or fitted then renamed as predictions. Any self-citations are incidental and not load-bearing for the central empirical claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The difficulty estimator and geodesic distance are presented as engineering choices rather than new theoretical primitives.

pith-pipeline@v0.9.0 · 5490 in / 1170 out tokens · 47974 ms · 2026-05-09T19:58:17.590806+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 1 internal anchor

[1]

and Baheri, A

Marzieh Amiri Shahbazi and Ali Baheri. Geometry-aware uncertainty quantification via conformal prediction on man- ifolds.arXiv:2602.16015, 2026. 1, 4

work page internal anchor Pith review arXiv 2026
[2]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle in- troduction to conformal prediction and distribution-free un- certainty quantification.Foundations and Trends in Machine Learning, 16(4):494–591, 2023. 1, 2

work page 2023
[3]

Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100.International Journal of Computer Vision, 130:33–55, 2022

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, et al. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100.International Journal of Computer Vision, 130:33–55, 2022. 2

work page 2022
[4]

Digging into self-supervised monocular depth estimation

Cl ´ement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. Digging into self-supervised monocular depth estimation. InICCV, 2019. 2

work page 2019
[5]

Ego-Exo4D: Understanding skilled human activity from first- and third- person perspectives

Kristen Grauman, Andrew Westbury, et al. Ego-Exo4D: Understanding skilled human activity from first- and third- person perspectives. InCVPR, 2024. 4

work page 2024
[6]

LightGlue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. LightGlue: Local feature matching at light speed. In ICCV, 2023. 2

work page 2023
[7]

Sch¨onberger

Linfei Pan, D ´aniel Bar´ath, Marc Pollefeys, and Johannes L. Sch¨onberger. Global structure-from-motion revisited, 2024. 2

work page 2024
[8]

Conformalized quantile regression

Yaniv Romano, Evan Patterson, and Emmanuel Cand `es. Conformalized quantile regression. InNeurIPS, 2019. 2

work page 2019
[9]

Stutts, Danilo Erricolo, Theja Tulabandhula, and Amit Ranjan Trivedi

Alex C. Stutts, Danilo Erricolo, Theja Tulabandhula, and Amit Ranjan Trivedi. Lightweight, uncertainty-aware con- formalized visual odometry. InCVPR Workshops, 2023. 1

work page 2023
[10]

DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras

Zachary Teed and Jia Deng. DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. In NeurIPS, 2021. 4

work page 2021
[11]

Davison, and Dima Damen

Vadim Tschernezki, Ahmad Sherburn, Andrew J. Davison, and Dima Damen. EPIC-Fields: Marrying 3D geometry and video understanding. InNeurIPS, 2023. 1, 2

work page 2023
[12]

Springer, 2005

Vladimir V ovk, Alex Gammerman, and Glenn Shafer.Algo- rithmic Learning in a Random World. Springer, 2005. 1

work page 2005
[13]

MAC-VO: Metrics-aware covariance for learning-based stereo visual odometry

Yuheng Wang et al. MAC-VO: Metrics-aware covariance for learning-based stereo visual odometry. InICRA, 2025. 4

work page 2025
[14]

Object pose estimation with statistical guarantees: Conformal keypoint detection and ge- ometric uncertainty propagation

Heng Yang and Marco Pavone. Object pose estimation with statistical guarantees: Conformal keypoint detection and ge- ometric uncertainty propagation. InCVPR, pages 8947– 8958, 2023. 1

work page 2023
[15]

CLOSURE: Fast quantifi- cation of pose uncertainty sets

Heng Yang and Marco Pavone. CLOSURE: Fast quantifi- cation of pose uncertainty sets. InRobotics: Science and Systems, 2024. 1

work page 2024

[1] [1]

and Baheri, A

Marzieh Amiri Shahbazi and Ali Baheri. Geometry-aware uncertainty quantification via conformal prediction on man- ifolds.arXiv:2602.16015, 2026. 1, 4

work page internal anchor Pith review arXiv 2026

[2] [2]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle in- troduction to conformal prediction and distribution-free un- certainty quantification.Foundations and Trends in Machine Learning, 16(4):494–591, 2023. 1, 2

work page 2023

[3] [3]

Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100.International Journal of Computer Vision, 130:33–55, 2022

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, et al. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100.International Journal of Computer Vision, 130:33–55, 2022. 2

work page 2022

[4] [4]

Digging into self-supervised monocular depth estimation

Cl ´ement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. Digging into self-supervised monocular depth estimation. InICCV, 2019. 2

work page 2019

[5] [5]

Ego-Exo4D: Understanding skilled human activity from first- and third- person perspectives

Kristen Grauman, Andrew Westbury, et al. Ego-Exo4D: Understanding skilled human activity from first- and third- person perspectives. InCVPR, 2024. 4

work page 2024

[6] [6]

LightGlue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. LightGlue: Local feature matching at light speed. In ICCV, 2023. 2

work page 2023

[7] [7]

Sch¨onberger

Linfei Pan, D ´aniel Bar´ath, Marc Pollefeys, and Johannes L. Sch¨onberger. Global structure-from-motion revisited, 2024. 2

work page 2024

[8] [8]

Conformalized quantile regression

Yaniv Romano, Evan Patterson, and Emmanuel Cand `es. Conformalized quantile regression. InNeurIPS, 2019. 2

work page 2019

[9] [9]

Stutts, Danilo Erricolo, Theja Tulabandhula, and Amit Ranjan Trivedi

Alex C. Stutts, Danilo Erricolo, Theja Tulabandhula, and Amit Ranjan Trivedi. Lightweight, uncertainty-aware con- formalized visual odometry. InCVPR Workshops, 2023. 1

work page 2023

[10] [10]

DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras

Zachary Teed and Jia Deng. DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. In NeurIPS, 2021. 4

work page 2021

[11] [11]

Davison, and Dima Damen

Vadim Tschernezki, Ahmad Sherburn, Andrew J. Davison, and Dima Damen. EPIC-Fields: Marrying 3D geometry and video understanding. InNeurIPS, 2023. 1, 2

work page 2023

[12] [12]

Springer, 2005

Vladimir V ovk, Alex Gammerman, and Glenn Shafer.Algo- rithmic Learning in a Random World. Springer, 2005. 1

work page 2005

[13] [13]

MAC-VO: Metrics-aware covariance for learning-based stereo visual odometry

Yuheng Wang et al. MAC-VO: Metrics-aware covariance for learning-based stereo visual odometry. InICRA, 2025. 4

work page 2025

[14] [14]

Object pose estimation with statistical guarantees: Conformal keypoint detection and ge- ometric uncertainty propagation

Heng Yang and Marco Pavone. Object pose estimation with statistical guarantees: Conformal keypoint detection and ge- ometric uncertainty propagation. InCVPR, pages 8947– 8958, 2023. 1

work page 2023

[15] [15]

CLOSURE: Fast quantifi- cation of pose uncertainty sets

Heng Yang and Marco Pavone. CLOSURE: Fast quantifi- cation of pose uncertainty sets. InRobotics: Science and Systems, 2024. 1

work page 2024