hub Mixed citations

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal · 2023 · cs.CV · arXiv 2301.00493

Mixed citation behavior. Most common role is background (55%).

39 Pith papers citing it

Background 55% of classified citations

open full Pith review browse 39 citing papers arXiv PDF

abstract

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 6 background 4 baseline 1

citation-polarity summary

background 6 use dataset 4 baseline 1

representative citing papers

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.

Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

cs.CV · 2026-05-06 · conditional · novelty 7.0

CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.

TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

cs.CV · 2026-04-29 · accept · novelty 7.0

TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

ADM-GS decomposes static background appearance into traversal-invariant material and traversal-dependent illumination via a frequency-separated neural light field, yielding +0.98 dB PSNR gains and better cross-traversal consistency on Argoverse 2 and Waymo data.

RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

RayMamba improves long-range 3D object detection by ray-aligned serialization of sparse voxels for state space modeling, delivering up to 2.49 mAP gain on nuScenes in the 40-50 m range.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

UniDAC: Universal Metric Depth Estimation for Any Camera

cs.CV · 2026-03-28 · unverdicted · novelty 7.0

UniDAC achieves universal metric depth estimation across camera types by decoupling relative depth prediction from spatially varying scale estimation using a depth-guided module and distortion-aware positional embedding.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding

cs.CV · 2026-03-02 · unverdicted · novelty 7.0

TopoMaskV3 adds dense offset and height heads to produce standalone 3D road centerlines from masks and reports 28.5 OLS on a new geographically disjoint long-range benchmark.

RetroMotion: Retrocausal Motion Forecasting Models are Instructable

cs.CV · 2025-05-26 · unverdicted · novelty 7.0

Retrocausal transformer decomposes multi-agent motion forecasts into marginals and pairwise joints, models uncertainty with compressed exponentials, achieves strong Waymo results, generalizes to Argoverse 2 and V2X-Seq, and enables implicit instruction following from standard training.

STELLAR: Scaling 3D Perception Large Models for Autonomous Driving

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.

Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

STRELGen combines a multi-agent diffusion model with differentiable STREL specifications to optimize latent space for generating plausible yet safety-critical driving scenarios.

Unlocking Dense Metric Depth Estimation in VLMs

cs.CV · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

DepthVLM converts a standard VLM into a dense metric depth predictor by attaching a lightweight head and training under unified vision-text supervision, outperforming prior VLMs and some pure vision models on a new indoor-outdoor benchmark.

MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

MUSDA proposes hierarchical domain classifiers for multi-modality feature alignment and a prototype graph strategy for multi-source prediction fusion in unsupervised domain adaptation for 3D object detection.

GSMap: 2D Gaussians for Online HD Mapping

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.

Unified Map Prior Encoder for Mapping and Planning

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

UMPE fuses any subset of HD/SD vector maps, raster SD maps, and satellite imagery into BEV features via alignment-aware vector and raster branches, raising mapping mAP by 5.3-5.9 points and cutting planning L2 error by 0.30 m on nuScenes.

LIE: LiDAR-only HD Map Construction with Intensity Enhancement via Online Knowledge Distillation

cs.CV · 2026-05-02 · unverdicted · novelty 6.0

LIE delivers LiDAR-only HD map segmentation via online knowledge distillation that fuses intensity maps, beating the best camera-only model by 8.2% mIoU on nuScenes while adapting quickly to new datasets.

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions

eess.SY · 2026-04-27 · unverdicted · novelty 6.0

VLM-VPI uses Qwen3-VL and GPT-OSS models for pedestrian intent and age reasoning plus a tiered safety controller, reporting 92.3% intent accuracy in CARLA and reduced conflicts versus rule-based and supervised baselines.

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.

CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.

EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-shot curve decoder.

citing papers explorer

Showing 39 of 39 citing papers.

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving cs.RO · 2026-05-18 · unverdicted · none · ref 75 · internal anchor
4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.
Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning cs.CV · 2026-05-09 · unverdicted · none · ref 2 · internal anchor
UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography cs.CV · 2026-05-06 · conditional · none · ref 53 · internal anchor
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation cs.CV · 2026-04-29 · accept · none · ref 17 · internal anchor
TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
WildDet3D: Scaling Promptable 3D Detection in the Wild cs.CV · 2026-04-09 · unverdicted · none · ref 57 · internal anchor
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction cs.CV · 2026-04-07 · unverdicted · none · ref 66 · internal anchor
ADM-GS decomposes static background appearance into traversal-invariant material and traversal-dependent illumination via a frequency-separated neural light field, yielding +0.98 dB PSNR gains and better cross-traversal consistency on Argoverse 2 and Waymo data.
RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 2 · internal anchor
RayMamba improves long-range 3D object detection by ray-aligned serialization of sparse voxels for state space modeling, delivering up to 2.49 mAP gain on nuScenes in the 40-50 m range.
A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 17 · internal anchor
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
UniDAC: Universal Metric Depth Estimation for Any Camera cs.CV · 2026-03-28 · unverdicted · none · ref 59 · internal anchor
UniDAC achieves universal metric depth estimation across camera types by decoupling relative depth prediction from spatially varying scale estimation using a depth-guided module and distortion-aware positional embedding.
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset cs.CV · 2026-03-24 · unverdicted · none · ref 76 · internal anchor
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding cs.CV · 2026-03-02 · unverdicted · none · ref 35 · internal anchor
TopoMaskV3 adds dense offset and height heads to produce standalone 3D road centerlines from masks and reports 28.5 OLS on a new geographically disjoint long-range benchmark.
RetroMotion: Retrocausal Motion Forecasting Models are Instructable cs.CV · 2025-05-26 · unverdicted · none · ref 52 · internal anchor
Retrocausal transformer decomposes multi-agent motion forecasts into marginals and pairwise joints, models uncertainty with compressed exponentials, achieves strong Waymo results, generalizes to Argoverse 2 and V2X-Seq, and enables implicit instruction following from standard training.
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving cs.CV · 2026-05-19 · unverdicted · none · ref 36 · internal anchor
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
Guiding Neuro-Symbolic Scenario Generation with Spatio-Temporal Logic cs.RO · 2026-05-18 · unverdicted · none · ref 26 · internal anchor
STRELGen combines a multi-agent diffusion model with differentiable STREL specifications to optimize latent space for generating plausible yet safety-critical driving scenarios.
Unlocking Dense Metric Depth Estimation in VLMs cs.CV · 2026-05-15 · unverdicted · none · ref 59 · 2 links · internal anchor
DepthVLM converts a standard VLM into a dense metric depth predictor by attaching a lightweight head and training under unified vision-text supervision, outperforming prior VLMs and some pure vision models on a new indoor-outdoor benchmark.
MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 7 · internal anchor
MUSDA proposes hierarchical domain classifiers for multi-modality feature alignment and a prototype graph strategy for multi-source prediction fusion in unsupervised domain adaptation for 3D object detection.
GSMap: 2D Gaussians for Online HD Mapping cs.CV · 2026-05-10 · unverdicted · none · ref 32 · internal anchor
GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.
Unified Map Prior Encoder for Mapping and Planning cs.CV · 2026-05-04 · unverdicted · none · ref 12 · internal anchor
UMPE fuses any subset of HD/SD vector maps, raster SD maps, and satellite imagery into BEV features via alignment-aware vector and raster branches, raising mapping mAP by 5.3-5.9 points and cutting planning L2 error by 0.30 m on nuScenes.
LIE: LiDAR-only HD Map Construction with Intensity Enhancement via Online Knowledge Distillation cs.CV · 2026-05-02 · unverdicted · none · ref 44 · internal anchor
LIE delivers LiDAR-only HD map segmentation via online knowledge distillation that fuses intensity maps, beating the best camera-only model by 8.2% mIoU on nuScenes while adapting quickly to new datasets.
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions eess.SY · 2026-04-27 · unverdicted · none · ref 20 · internal anchor
VLM-VPI uses Qwen3-VL and GPT-OSS models for pedestrian intent and age reasoning plus a tiered safety controller, reporting 92.3% intent accuracy in CARLA and reduced conflicts versus rule-based and supervised baselines.
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving cs.CV · 2026-04-22 · unverdicted · none · ref 40 · internal anchor
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation cs.CV · 2026-04-20 · unverdicted · none · ref 106 · 2 links · internal anchor
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras cs.CV · 2026-04-18 · unverdicted · none · ref 38 · internal anchor
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications cs.CV · 2026-04-18 · unverdicted · none · ref 75 · internal anchor
EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-shot curve decoder.
EagleVision: A Multi-Task Benchmark for Cross-Domain Perception in High-Speed Autonomous Racing cs.RO · 2026-04-13 · unverdicted · none · ref 4 · internal anchor
EagleVision creates a standardized multi-task benchmark for LiDAR perception in high-speed autonomous racing, with experiments showing that pretraining on racing data improves cross-domain detection and prediction performance.
Visually-grounded Humanoid Agents cs.CV · 2026-04-09 · unverdicted · none · ref 101 · internal anchor
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection cs.CV · 2026-04-07 · unverdicted · none · ref 47 · internal anchor
Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes cs.CV · 2026-04-06 · unverdicted · none · ref 59 · internal anchor
HorizonWeaver enables photorealistic, instruction-driven multi-level editing of complex driving scenes with improved generalization via a new paired dataset, language-guided masks, and joint training losses.
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction cs.RO · 2026-03-25 · conditional · none · ref 48 · internal anchor
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness cs.CV · 2026-02-22 · unverdicted · none · ref 56 · internal anchor
OpenVO estimates ego-motion from monocular dashcam footage with varying observation rates and uncalibrated cameras by encoding temporal dynamics in a two-frame regression framework and using 3D priors from foundation models, delivering over 20% gains and 46-92% lower errors on KITTI, nuScenes, and A
Flux4D: Flow-based Unsupervised 4D Reconstruction cs.CV · 2025-12-02 · unverdicted · none · ref 54 · internal anchor
Flux4D reconstructs large-scale dynamic 4D scenes unsupervised by predicting moving 3D Gaussians from photometric losses and static regularization when trained across multiple scenes.
TARS: Traffic-Aware Radar Scene Flow Estimation cs.CV · 2025-03-13 · conditional · none · ref 20 · internal anchor
TARS jointly performs object detection and radar scene flow estimation by building a Traffic Vector Field from detector features to enforce traffic-level rigid motion consistency, reporting 23% and 15% gains on proprietary and View-of-Delft datasets.
Generating Realistic Safety-Critical Scenarios for Vehicle-Pedestrian Interactions cs.RO · 2026-05-17 · conditional · none · ref 18 · internal anchor
A three-stage framework pre-trains multi-agent RL agents on real safety-critical data, refines them via online learning in CARLA, and generates the VPSCI dataset of over 198,000 realistic vehicle-pedestrian interaction episodes.
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection cs.CV · 2026-04-20 · unverdicted · none · ref 49 · internal anchor
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic cs.AI · 2026-04-14 · unverdicted · none · ref 24 · internal anchor
This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.
LEAN-3D: Low-latency Hierarchical Point Cloud Codec for Mobile 3D Streaming eess.SP · 2026-04-06 · unverdicted · none · ref 41 · internal anchor
LEAN-3D delivers 3-5x lower latency and up to 5.1x lower edge energy for learned point cloud compression on mobile hardware by restricting learned components to shallow hierarchy levels and using deterministic coding deeper in the tree.
Planning by Simulation: Motion Planning with Learning-based Parallel Scenario Prediction for Autonomous Driving cs.RO · 2024-11-15 · unverdicted · none · ref 48 · internal anchor
PS framework integrates MCTS with query-centric prediction to simulate and cost ego planning actions while accounting for interactive scenario responses on the Argoverse 2 dataset.
AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation cs.CV · 2026-05-10 · unverdicted · none · ref 87 · internal anchor
AtteConDA adds attention-based conflict suppression to multi-condition diffusion models so that generated driving-scene images retain richer structural cues from the original annotations.
TopoHR: Hierarchical Centerline Representation for Cyclic Topology Reasoning in Driving Scenes with Point-to-Instance Relations cs.CV · 2026-04-27 · unreviewed · ref 29 · internal anchor

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer