pith. machine review for the scientific record. sign in

arXiv preprint arXiv:2509.12201 (2025) 4

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.CV 6

years

2026 5 2025 1

verdicts

UNVERDICTED 6

roles

dataset 1

polarities

use dataset 1

representative citing papers

Self-Improving 4D Perception via Self-Distillation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight benchmarks.

Depth Anything 3: Recovering the Visual Space from Any Views

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

Embody4D: A Generalist 4D World Model for Embodied AI

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

citing papers explorer

Showing 6 of 6 citing papers.

  • Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models cs.CV · 2026-05-03 · unverdicted · none · ref 64

    M²-REPA decouples modality-specific features inside a diffusion model and aligns each to its matching expert foundation model via an alignment loss plus a decoupling regularizer, yielding better visual quality and long-term consistency in multi-modal video generation.

  • Self-Improving 4D Perception via Self-Distillation cs.CV · 2026-04-09 · unverdicted · none · ref 82

    SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight benchmarks.

  • Depth Anything 3: Recovering the Visual Space from Any Views cs.CV · 2025-11-13 · unverdicted · none · ref 129

    DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

  • WildPose: A Unified Framework for Robust Pose Estimation in the Wild cs.CV · 2026-05-12 · unverdicted · none · ref 61

    WildPose unifies feedforward 3D features from MASt3R with differentiable bundle adjustment for robust monocular pose estimation across dynamic, static, and low-ego-motion scenes.

  • Embody4D: A Generalist 4D World Model for Embodied AI cs.CV · 2026-05-03 · unverdicted · none · ref 62

    Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

  • Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 59

    Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.