PRISM-SLAM adds a Plücker Ray-Distance Factor and dynamic uncertainty gating to a VFM-augmented factor graph to deliver scale-consistent metric SLAM at 30 FPS from monocular RGB.
International Conference on Learning Representations (ICLR) , year =
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
dataset 1polarities
use dataset 1representative citing papers
Projection heads act as geometric buffers; nonlinear heads induce negative Hessian curvature to escape dimensional collapse while linear heads rely on discrete dynamics and BatchNorm.
EgoForce recovers absolute camera-space 3D hand pose from monocular egocentric images using forearm guidance, a unified arm-hand transformer, and a closed-form ray-space solver that handles fisheye, perspective, and wide-FOV cameras.
ConsistNav is a new training-free framework that uses a semantic executive controller, persistent candidate memory, and stability-aware action control to close the action consistency gap in zero-shot object navigation, reporting SOTA results on HM3D and MP3D with 11.4% SR and 7.9% SPL gains on MP3D.
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
An explanatory book that supplies a clear mental map and intuition for how Vision-Language Models combine vision and language capabilities.
citing papers explorer
-
PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM
PRISM-SLAM adds a Plücker Ray-Distance Factor and dynamic uncertainty gating to a VFM-augmented factor graph to deliver scale-consistent metric SLAM at 30 FPS from monocular RGB.
-
The Geometry of Projection Heads: Conditioning, Invariance, and Collapse
Projection heads act as geometric buffers; nonlinear heads induce negative Hessian curvature to escape dimensional collapse while linear heads rely on discrete dynamics and BatchNorm.
-
EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera
EgoForce recovers absolute camera-space 3D hand pose from monocular egocentric images using forearm guidance, a unified arm-hand transformer, and a closed-form ray-space solver that handles fisheye, perspective, and wide-FOV cameras.
-
ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control
ConsistNav is a new training-free framework that uses a semantic executive controller, persistent candidate memory, and stability-aware action control to close the action consistency gap in zero-shot object navigation, reporting SOTA results on HM3D and MP3D with 11.4% SR and 7.9% SPL gains on MP3D.
-
Sessa: Selective State Space Attention
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
-
From Pixels to Prompts: Vision-Language Models
An explanatory book that supplies a clear mental map and intuition for how Vision-Language Models combine vision and language capabilities.