hub

Monst3r: A simple approach for estimat- ing geometry in the presence of motion

Zhang, J · 2024 · arXiv 2410.03825

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

cs.CV · 2026-04-29 · conditional · novelty 7.0

AirZoo is a new large-scale synthetic dataset for aerial 3D vision that improves state-of-the-art models on image retrieval, cross-view matching, and 3D reconstruction when used for fine-tuning.

Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

Holo360D is the first large-scale dataset providing continuous panoramic sequences with accurately aligned high-completeness depth maps and meshes for training panoramic 3D reconstruction models.

Learning 3D Reconstruction with Priors in Test Time

cs.CV · 2026-04-04 · unverdicted · novelty 7.0

Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.

$\pi^3$: Permutation-Equivariant Visual Geometry Learning

cs.CV · 2025-07-17 · conditional · novelty 7.0

π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and dense reconstruction benchmarks.

CoGE: Sim-to-Real Online Geometric Estimation for Monocular Colonoscopy

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

CoGE achieves state-of-the-art monocular geometric estimation in colonoscopy by training solely on simulated data via an illumination-aware Retinex-based module and a wavelet-based structure-aware module.

Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.

RigidFormer: Learning Rigid Dynamics using Transformers

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

RigidFormer learns mesh-free rigid dynamics from point clouds using object-centric anchors, Anchor-Vertex Pooling, Anchor-based RoPE, and differentiable Kabsch alignment to enforce rigidity.

Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.

Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction

cs.CV · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Ray-aware pointer memory with adaptive retain-or-replace updates enhances stability and accuracy in streaming 3D reconstruction.

Long-tail Internet photo reconstruction

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

Finetuning 3D foundation models on simulated sparse subsets from MegaDepth-X produces robust reconstructions from extremely sparse, noisy internet photos while preserving performance on dense benchmarks.

Vista4D: Video Reshooting with 4D Point Clouds

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.

Self-Improving 4D Perception via Self-Distillation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight benchmarks.

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and text-to-video synthesis.

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

WildPose unifies feedforward 3D features from MASt3R with differentiable bundle adjustment for robust monocular pose estimation across dynamic, static, and low-ego-motion scenes.

LychSim: A Controllable and Interactive Simulation Framework for Vision Research

cs.CV · 2026-05-12 · unverdicted · novelty 4.0

LychSim introduces a controllable simulation platform on Unreal Engine 5 with Python API, procedural generation, and LLM integration for vision research tasks.

DINO_4D: Semantic-Aware 4D Reconstruction

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

DINO_4D uses frozen DINOv3 features to inject semantic awareness into 4D dynamic scene reconstruction, improving tracking accuracy and completeness on benchmarks while preserving O(T) complexity.

citing papers explorer

Showing 1 of 1 citing paper after filters.

$\pi^3$: Permutation-Equivariant Visual Geometry Learning cs.CV · 2025-07-17 · conditional · none · ref 16
π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and dense reconstruction benchmarks.

Monst3r: A simple approach for estimat- ing geometry in the presence of motion

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer