arxiv: 2301.00493 · v1 · submitted 2023-01-02 · 💻 cs.CV · cs.AI· cs.LG· cs.RO

Recognition: no theorem link

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Andrew Hartnett, Benjamin Wilson, Bowen Pan, Deva Ramanan, Jagjeet Singh, James Hays, Jhony Kaesemodel Pontes, John Lambert, Peter Carr, Ratnesh Kumar, Siddhesh Khandelwal, Tanmay Agarwal, William Qi

Authors on Pith no claims yet

Pith reviewed 2026-05-12 20:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGcs.RO

keywords self-drivingperceptionmotion forecastinglidardatasets3D annotationsHD mapsautonomous vehicles

0 comments

The pith

Argoverse 2 releases three large datasets to support new research in self-driving perception and forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Argoverse 2 as a collection of three datasets for perception and forecasting in autonomous driving. The Sensor Dataset includes 1,000 sequences with camera images, lidar, and 3D annotations for 26 object types. The Lidar Dataset offers 20,000 unlabeled sequences to enable self-supervised learning and point cloud forecasting. The Motion Forecasting Dataset contains 250,000 scenarios focused on interactions, with track histories for predicting future actor movements. Each scenario comes with its own high-definition map from data collected in six cities. The authors argue that these resources will enable machine learning advances not possible with smaller or less detailed existing datasets.

Core claim

Argoverse 2 comprises the annotated Sensor Dataset with 1,000 multimodal sequences and 3D cuboid labels for 26 categories, the Lidar Dataset with 20,000 sequences of point clouds for self-supervised tasks, and the Motion Forecasting Dataset with 250,000 interaction scenarios providing track histories for future motion prediction, all equipped with 6-DOF pose and HD maps of lanes and crosswalks from six cities.

What carries the argument

The three complementary datasets supplying multimodal annotated sensor data, large-scale unlabeled lidar sequences, and detailed motion scenarios with HD maps from multiple cities.

If this is right

3D perception models can be trained and evaluated using annotations for 26 sufficiently-sampled object categories.
Self-supervised learning and point cloud forecasting can be pursued with the largest collection of lidar sensor data.
Motion forecasting models can predict future locations for scored actors based on track histories of location, heading, velocity, and category in challenging scenarios.
Research can leverage HD maps with 3D lane and crosswalk geometry sourced from six distinct cities.
The datasets support both new and existing machine learning problems in self-driving that prior collections do not address as effectively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers may combine the sensor annotations with the forecasting scenarios to develop unified models that perform detection and future-motion prediction together.
The multi-city coverage could help create models more robust to differences in road layouts and traffic behaviors across urban areas.
Pre-training on the large unlabeled lidar sequences might boost performance on other 3D vision tasks beyond self-driving.
Emphasis on interaction-rich scenarios could support development of safer prediction systems that handle complex multi-vehicle situations.

Load-bearing premise

The provided annotations are accurate enough and the selected scenarios are sufficiently representative to drive meaningful improvements in deployed self-driving systems.

What would settle it

A test showing that models trained on Argoverse 2 data achieve no measurable gains in accuracy or generalization on independent real-world self-driving benchmarks compared to models trained on smaller prior datasets would falsify the claim.

read the original abstract

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Argoverse 2 is a straightforward scale-up of self-driving datasets with more lidar sequences, forecasting scenarios, and object categories, plus solid documentation.

read the letter

The main thing to know about this paper is that Argoverse 2 releases three datasets at noticeably larger scales than the first version or most public alternatives: 1,000 annotated multimodal sensor sequences covering 26 object categories, 20,000 unlabeled lidar sequences, and 250,000 motion forecasting scenarios, each with its own HD map from one of six cities. That combination of volume, category breadth, and map detail is the actual new element relative to prior work cited in the abstract.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Argoverse 2, a collection of three datasets for self-driving perception and forecasting research. The Annotated Sensor Dataset provides 1,000 multimodal sequences with high-resolution imagery from seven ring cameras and two stereo cameras, lidar point clouds, 6-DOF map-aligned pose, and 3D cuboid annotations for 26 object categories. The Lidar Dataset offers 20,000 sequences of unlabeled lidar point clouds with map-aligned pose. The Motion Forecasting Dataset contains 250,000 scenarios with track histories for scored actors and HD maps with 3D lane and crosswalk geometry, all sourced from six cities. The authors state that these datasets will support new and existing machine learning research problems in ways that existing datasets do not, and all are released under CC BY-NC-SA 4.0.

Significance. If the datasets are released with the described scale, annotation quality, and diversity, this work provides a substantial resource for advancing 3D perception, self-supervised point cloud learning, point cloud forecasting, and motion prediction. Strengths include the explicit provision of annotation counts, category coverage, multi-city sourcing, and the combination of labeled and unlabeled data at large scale, which directly addresses limitations in prior collections and enables new research directions as claimed.

minor comments (3)

[Abstract] Abstract: The statement that all 26 object categories 'are sufficiently-sampled to support training and evaluation of 3D perception models' would be strengthened by including (or referencing) per-category instance counts or a table summarizing annotation statistics to allow readers to assess this claim directly.
The manuscript would benefit from a dedicated comparison section or table (e.g., Table 1) against prior datasets such as nuScenes, Waymo Open Dataset, and the original Argoverse to explicitly quantify differences in scale, number of categories, sensor modalities, and geographic coverage.
[Abstract] Motion Forecasting Dataset description: The criteria used to mine the 250,000 scenarios for 'interesting and challenging interactions' are not detailed in the provided abstract; adding a brief description of the mining process or heuristics would improve reproducibility and clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the Argoverse 2 datasets and for recommending acceptance. We appreciate the recognition that the scale, annotation quality, multi-city diversity, and combination of labeled and unlabeled data address limitations in prior collections and enable new research directions.

Circularity Check

0 steps flagged

No circularity: pure dataset release with no derivations or fitted predictions

full rationale

The paper is a data release describing three datasets (Sensor, Lidar, Motion Forecasting) with explicit counts, categories, sourcing, and annotation details. No equations, derivations, parameters, or predictive claims exist that could reduce to inputs by construction. The central assertion that the datasets enable new research rests on the documented scale, diversity, and annotations rather than any self-referential logic or self-citation chain. This is the standard non-circular structure for dataset papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper containing no mathematical derivations, fitted parameters, or postulated physical entities. The only background assumptions are standard ones about sensor calibration and map accuracy in the self-driving domain.

pith-pipeline@v0.9.0 · 5610 in / 1096 out tokens · 58895 ms · 2026-05-12T20:10:13.850741+00:00 · methodology

discussion (0)

Forward citations

Cited by 27 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning
cs.CV 2026-05 unverdicted novelty 7.0

UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
cs.CV 2026-05 conditional novelty 7.0

CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation
cs.CV 2026-04 accept novelty 7.0

TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
TopoHR: Hierarchical Centerline Representation for Cyclic Topology Reasoning in Driving Scenes with Point-to-Instance Relations
cs.CV 2026-04 unverdicted novelty 7.0

TopoHR proposes a hierarchical centerline representation and topology reasoning module with point-to-instance relations and cyclic interactions, achieving new state-of-the-art results on the OpenLane-V2 benchmark for ...
WildDet3D: Scaling Promptable 3D Detection in the Wild
cs.CV 2026-04 unverdicted novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction
cs.CV 2026-04 unverdicted novelty 7.0

ADM-GS decomposes static background appearance into traversal-invariant material and traversal-dependent illumination via a frequency-separated neural light field, yielding +0.98 dB PSNR gains and better cross-travers...
RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection
cs.CV 2026-04 unverdicted novelty 7.0

RayMamba improves long-range 3D object detection by ray-aligned serialization of sparse voxels for state space modeling, delivering up to 2.49 mAP gain on nuScenes in the 40-50 m range.
A global dataset of continuous urban dashcam driving
cs.CV 2026-04 accept novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
UniDAC: Universal Metric Depth Estimation for Any Camera
cs.CV 2026-03 unverdicted novelty 7.0

UniDAC achieves universal metric depth estimation across camera types by decoupling relative depth prediction from spatially varying scale estimation using a depth-guided module and distortion-aware positional embedding.
MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving
cs.CV 2026-05 unverdicted novelty 6.0

MUSDA proposes hierarchical domain classifiers for multi-modality feature alignment and a prototype graph strategy for multi-source prediction fusion in unsupervised domain adaptation for 3D object detection.
GSMap: 2D Gaussians for Online HD Mapping
cs.CV 2026-05 unverdicted novelty 6.0

GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.
Unified Map Prior Encoder for Mapping and Planning
cs.CV 2026-05 unverdicted novelty 6.0

UMPE fuses any subset of HD/SD vector maps, raster SD maps, and satellite imagery into BEV features via alignment-aware vector and raster branches, raising mapping mAP by 5.3-5.9 points and cutting planning L2 error b...
LIE: LiDAR-only HD Map Construction with Intensity Enhancement via Online Knowledge Distillation
cs.CV 2026-05 unverdicted novelty 6.0

LIE delivers LiDAR-only HD map segmentation via online knowledge distillation that fuses intensity maps, beating the best camera-only model by 8.2% mIoU on nuScenes while adapting quickly to new datasets.
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
eess.SY 2026-04 unverdicted novelty 6.0

VLM-VPI uses Qwen3-VL and GPT-OSS models for pedestrian intent and age reasoning plus a tiered safety controller, reporting 92.3% intent accuracy in CARLA and reduced conflicts versus rule-based and supervised baselines.
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
cs.CV 2026-04 unverdicted novelty 6.0

EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
cs.CV 2026-04 unverdicted novelty 6.0

OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
cs.CV 2026-04 unverdicted novelty 6.0

OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
cs.CV 2026-04 unverdicted novelty 6.0

CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications
cs.CV 2026-04 unverdicted novelty 6.0

EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-sh...
EagleVision: A Multi-Task Benchmark for Cross-Domain Perception in High-Speed Autonomous Racing
cs.RO 2026-04 unverdicted novelty 6.0

EagleVision creates a standardized multi-task benchmark for LiDAR perception in high-speed autonomous racing, with experiments showing that pretraining on racing data improves cross-domain detection and prediction per...
Visually-grounded Humanoid Agents
cs.CV 2026-04 unverdicted novelty 6.0

A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
cs.CV 2026-04 unverdicted novelty 6.0

Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes
cs.CV 2026-04 unverdicted novelty 6.0

HorizonWeaver enables photorealistic, instruction-driven multi-level editing of complex driving scenes with improved generalization via a new paired dataset, language-guided masks, and joint training losses.
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection
cs.CV 2026-04 unverdicted novelty 5.0

SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic
cs.AI 2026-04 unverdicted novelty 5.0

This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.
LEAN-3D: Low-latency Hierarchical Point Cloud Codec for Mobile 3D Streaming
eess.SP 2026-04 unverdicted novelty 5.0

LEAN-3D delivers 3-5x lower latency and up to 5.1x lower edge energy for learned point cloud compression on mobile hardware by restricting learned components to shallow hierarchy levels and using deterministic coding ...
AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation
cs.CV 2026-05 unverdicted novelty 4.0

AtteConDA adds attention-based conflict suppression to multi-condition diffusion models so that generated driving-scene images retain richer structural cues from the original annotations.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 26 Pith papers · 1 internal anchor

[1]

SemanticKITTI: A dataset for semantic scene understanding of lidar sequences

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In ICCV, October 2019

work page 2019
[2]

Range conditioned dilated convolutions for scale invariant 3d object detection

Alex Bewley, Pei Sun, Thomas Mensink, Drago Anguelov, and Cristian Sminchisescu. Range conditioned dilated convolutions for scale invariant 3d object detection. In Conference on Robot Learning, 2020

work page 2020
[3]

Language Models are Few-Shot Learners

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[4]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A Multimodal Dataset for Autonomous Driving. In CVPR, 2020

work page 2020
[5]

To the point: Efﬁcient 3d object detection in the range image with graph convolution kernels

Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, and Dragomir Anguelov. To the point: Efﬁcient 3d object detection in the range image with graph convolution kernels. In CVPR, June 2021

work page 2021
[6]

Argoverse: 3D Tracking and Forecasting With Rich Maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argoverse: 3D Tracking and Forecasting With Rich Maps. In CVPR, 2019

work page 2019
[7]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020

work page 2020
[8]

GeoSim: Realistic video simulation via geometry-aware composition for self-driving

Yun Chen, Frieda Rong, Shivam Duggal, Shenlong Wang, Xinchen Yan, Sivabalan Mani- vasagam, Shangjie Xue, Ersin Yumer, and Raquel Urtasun. GeoSim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, June 2021

work page 2021
[9]

Kaist multi-spectral day/night data set for autonomous and assisted driving

Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyounghwan An, and In So Kweon. Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3):934–948, 2018. 11

work page 2018
[10]

All-day visual place recognition: Benchmark dataset and baseline

Yukyung Choi, Namil Kim, Kibaek Park, Soonmin Hwang, Jae Shin Yoon, Yoon In, and Inso Kweon. All-day visual place recognition: Benchmark dataset and baseline. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Workshop on Visual Place Recognition in Changing Environments, 2015

work page 2015
[11]

Gorela: Go relative for viewpoint-invariant motion forecasting

Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. arXiv preprint arXiv:2211.02545, 2022

work page arXiv 2022
[12]

Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving : The waymo open motion da...

work page arXiv 2021
[13]

Technical report for cvpr 2022 workshop on autonomous driving argoverse 3d object detection competition, 2022

Jin Fang, Qinghao Meng, Dingfu Zhou, Chulin Tang, Jianbing Shen, Cheng-Zhong Xu, and Liangjun Zhang. Technical report for cvpr 2022 workshop on autonomous driving argoverse 3d object detection competition, 2022

work page 2022
[14]

Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection

Nils Gählert, Nicolas Jourdan, Marius Cordts, Uwe Franke, and Joachim Denzler. Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection. CoRR, abs/2006.07864, 2020

work page arXiv 2006
[15]

VectorNet: Encoding hd maps and agent dynamics from vectorized representation

Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. VectorNet: Encoding hd maps and agent dynamics from vectorized representation. In CVPR, June 2020

work page 2020
[16]

Afdet: Anchor free one stage 3d object detection

Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Yu Wang, Sijia Chen, Li Huang, and Yuan Li. Afdet: Anchor free one stage 3d object detection. In CVPR Workshops, 2020

work page 2020
[17]

Are we ready for autonomous driving? The KITTI vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR, June 2012

work page 2012
[18]

Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mühlegg, Sebastian Dorn, Tiffany Fernandez, Martin Jänicke, Sudesh Mirashi, Chiragkumar Savani, Martin Sturm, Oleksandr V orobiov, Martin Oelker, Sebastian Garreis, and Peter Schuberth. A2D2: audi autonomous driving d...

work page arXiv 2004
[19]

Home: Heatmap output for future motion estimation

Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Home: Heatmap output for future motion estimation. arXiv preprint arXiv:2105.10968, 2021

work page arXiv 2021
[20]

Thomas: Trajectory heatmap output with learned multi-agent sampling

Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. In ICLR, 2022

work page 2022
[21]

Streaming object detection for 3-d point clouds

Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, and Zhifeng Chen. Streaming object detection for 3-d point clouds. In ECCV, 2020

work page 2020
[22]

One Thousand and One Hours: Self-driving Motion Prediction Dataset

John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, and Peter Ondruska. One Thousand and One Hours: Self-driving Motion Prediction Dataset. arXiv:2006.14480 [cs], November 2020. Comment: Presented at CoRL2020

work page arXiv 2006
[23]

Safe local motion planning with self-supervised freespace forecasting

Peiyun Hu, Aaron Huang, John Dolan, David Held, and Deva Ramanan. Safe local motion planning with self-supervised freespace forecasting. In CVPR, June 2021

work page 2021
[24]

Kesten, M

R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. Ondruska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova, C. Tao, L. Platinsky, W. Jiang, and V . Shet. Lyft level 5 av dataset.arXiv, 2019

work page 2019
[25]

What-if motion prediction for autonomous driving

Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, and Deva Ramanan. What-if motion prediction for autonomous driving. arXiv preprint arXiv:2008.10587, 2020

work page arXiv 2008
[26]

Lambert and James Hays

John W. Lambert and James Hays. Trust, but Verify: Cross-modality fusion for hd map change detection. In Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

work page 2021
[27]

Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. PointPillars: Fast encoders for object detection from point clouds. In CVPR, June 2019

work page 2019
[28]

Social attention for autonomous decision-making in dense trafﬁc

Edouard Leurent and Jean Mercat. Social attention for autonomous decision-making in dense trafﬁc. CoRR, abs/1911.12250, 2019. 12

work page arXiv 1911
[29]

Towards streaming perception

Mengtian Li, Yu-Xiong Wang, and Deva Ramanan. Towards streaming perception. InECCV, 2020

work page 2020
[30]

Hdmapnet: An online HD map construction and evaluation framework

Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online HD map construction and evaluation framework. CoRR, abs/2107.06307, 2021

work page arXiv 2021
[31]

Learning lane graph representations for motion forecasting

Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, and Raquel Urtasun. Learning lane graph representations for motion forecasting. In ECCV, 2020

work page 2020
[32]

Bevfusion: Multi-task multi-sensor fusion with uniﬁed bird’s-eye view representation

Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with uniﬁed bird’s-eye view representation. arXiv preprint arXiv:2205.13542, 2022

work page arXiv 2022
[33]

Trafﬁcpredict: Trajectory prediction for heterogeneous trafﬁc-agents.CoRR, abs/1811.02146, 2018

Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. Trafﬁcpredict: Trajectory prediction for heterogeneous trafﬁc-agents.CoRR, abs/1811.02146, 2018

work page arXiv 2018
[34]

Andrey Malinin, Neil Band, Alexander Ganshin, German Chesnokov, Yarin Gal, Mark J. F. Gales, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Mariya Shmatova, Panos Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. CoRR, abs/2107.07455, 2021

work page arXiv 2021
[35]

LiDARsim: Realistic lidar simula- tion by leveraging the real world

Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, and Raquel Urtasun. LiDARsim: Realistic lidar simula- tion by leveraging the real world. In CVPR, June 2020

work page 2020
[36]

One million scenes for aut onomous driving: Once dataset

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Hang Xu, and Chunjing Xu. One Million Scenes for Autonomous Driving: ONCE Dataset. arXiv:2106.11037 [cs], August 2021. Comment: Accepted to NeurIPS 2021 Datasets and Benchmarks Track

work page arXiv 2021
[37]

Multi-head attention for multi-modal joint vehicle motion forecasting

Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting. In ICRA. IEEE, 2020

work page 2020
[38]

Multi-head attention for multi-modal joint vehicle motion forecasting, 2019

Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting, 2019

work page 2019
[39]

The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes

Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. CoRR, abs/1903.01568, 2019

work page arXiv 1903
[40]

A*3d dataset: Towards au- tonomous driving in challenging environments

Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3d dataset: Towards au- tonomous driving in challenging environments. CoRR, abs/1909.07541, 2019

work page arXiv 1909
[41]

Canadian adverse driving conditions dataset

Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Carlos Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. The Interna- tional Journal of Robotics Research, 40(4-5):681–690, Dec 2020

work page 2020
[42]

Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov

Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov. Offboard 3d object detection from point cloud sequences. In CVPR, June 2021

work page 2021
[43]

Argoverse motion forecast- ing competition

Jagjeet Singh, William Qi, Tanmay Agarwal, and Andrew Hartnett. Argoverse motion forecast- ing competition. https://eval.ai/web/challenges/challenge-page/454/overview. Accessed: 08-27-2021

work page 2021
[44]

Qml for argoverse 2 motion forecasting challenge, 2022

Tong Su, Xishun Wang, and Xiaodong Yang. Qml for argoverse 2 motion forecasting challenge, 2022

work page 2022
[45]

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in Perception...

work page 2020
[46]

Rsn: Range sparse net for efﬁcient, accurate lidar 3d object detection

Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, and Dragomir Anguelov. Rsn: Range sparse net for efﬁcient, accurate lidar 3d object detection. In CVPR, June 2021. 13

work page 2021
[47]

Ganet: Goal area network for motion forecasting, 2022

Mingkun Wang, Xinge Zhu, Changqian Yu, Wei Li, Yuexin Ma, Ruochun Jin, Xiaoguang Ren, Dongchun Ren, Mingxu Wang, and Wenjing Yang. Ganet: Goal area network for motion forecasting, 2022

work page 2022
[48]

4d forecast- ing: Sequential forecasting of 100,000 points

Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nicholas Rhinehart. 4d forecast- ing: Sequential forecasting of 100,000 points. In Proceedings of ECCV ’20 Workshops, August 2020

work page 2020
[49]

Inverting the forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting

Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nick Rhinehart. Inverting the forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting. In Proceedings of (CoRL) Conference on Robot Learning, November 2020

work page 2020
[50]

Surfelgan: Synthesizing realistic sensor data for autonomous driving

Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, and Henrik Kretzschmar. Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, June 2020

work page 2020
[51]

Center-based 3d object detection and tracking

Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center-based 3d object detection and tracking. In CVPR, June 2021

work page 2021
[52]

Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps

Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius Kummerle, Hendrik Konigshof, Christoph Stiller, Arnaud de La Fortelle, et al. Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088, 2019

work page arXiv 1910
[53]

Banet: Motion forecasting with boundary aware network, 2022

Chen Zhang, Honglin Sun, Chen Chen, and Yandong Guo. Banet: Motion forecasting with boundary aware network, 2022

work page 2022
[54]

Miss Rate

Jannik Zürn, Johan Vertens, and Wolfram Burgard. Lane graph estimation for scene understand- ing in urban driving. CoRR, abs/2105.00195, 2021. 6 Appendix 6.1 Additional Information About Sensor Suite In Figure 8, we provide a diagram of the sensor suite used to capture the Argoverse 2 datasets. Figure 9 shows the speed distribution for annotated pedestria...

work page arXiv 2021