pith. sign in

hub Mixed citations

Virtual KITTI 2

Mixed citation behavior. Most common role is background (50%).

50 Pith papers citing it
Background 50% of classified citations
abstract

This paper introduces an updated version of the well-known Virtual KITTI dataset which consists of 5 sequence clones from the KITTI tracking benchmark. In addition, the dataset provides different variants of these sequences such as modified weather conditions (e.g. fog, rain) or modified camera configurations (e.g. rotated by 15 degrees). For each sequence, we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data. Camera parameters and poses as well as vehicle locations are available as well. In order to showcase some of the dataset's capabilities, we ran multiple relevant experiments using state-of-the-art algorithms from the field of autonomous driving. The dataset is available for download at https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds.

hub tools

citation-role summary

dataset 8 background 3 method 1

citation-polarity summary

fields

cs.CV 50

clear filters

representative citing papers

PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

PointDiT is a from-scratch pixel-space Diffusion Transformer for monocular 3D point map estimation that outperforms latent diffusion models in sharpness and ambiguous regions while using a simpler architecture.

Argus: Metric Panoramic 3D Reconstruction for Indoor Scenes

cs.CV · 2026-06-29 · unverdicted · novelty 6.0 · 2 refs

Argus introduces a covisibility module and decomposed pixel-to-world mapping to deliver SOTA metric performance on camera pose, depth, and point cloud tasks using the Realsee3D panoramic dataset.

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 4 refs

GemDepth adds explicit camera-pose geometry embeddings and an alternating spatio-temporal transformer to produce sharper, more temporally consistent video depth maps than prior smoothing-based methods.

Geometric Context Transformer for Streaming 3D Reconstruction

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20 FPS over sequences longer than 10,000 frames.

citing papers explorer

Showing 50 of 50 citing papers.