Recognition: no theorem link
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
Pith reviewed 2026-05-12 20:10 UTC · model grok-4.3
The pith
Argoverse 2 releases three large datasets to support new research in self-driving perception and forecasting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Argoverse 2 comprises the annotated Sensor Dataset with 1,000 multimodal sequences and 3D cuboid labels for 26 categories, the Lidar Dataset with 20,000 sequences of point clouds for self-supervised tasks, and the Motion Forecasting Dataset with 250,000 interaction scenarios providing track histories for future motion prediction, all equipped with 6-DOF pose and HD maps of lanes and crosswalks from six cities.
What carries the argument
The three complementary datasets supplying multimodal annotated sensor data, large-scale unlabeled lidar sequences, and detailed motion scenarios with HD maps from multiple cities.
If this is right
- 3D perception models can be trained and evaluated using annotations for 26 sufficiently-sampled object categories.
- Self-supervised learning and point cloud forecasting can be pursued with the largest collection of lidar sensor data.
- Motion forecasting models can predict future locations for scored actors based on track histories of location, heading, velocity, and category in challenging scenarios.
- Research can leverage HD maps with 3D lane and crosswalk geometry sourced from six distinct cities.
- The datasets support both new and existing machine learning problems in self-driving that prior collections do not address as effectively.
Where Pith is reading between the lines
- Researchers may combine the sensor annotations with the forecasting scenarios to develop unified models that perform detection and future-motion prediction together.
- The multi-city coverage could help create models more robust to differences in road layouts and traffic behaviors across urban areas.
- Pre-training on the large unlabeled lidar sequences might boost performance on other 3D vision tasks beyond self-driving.
- Emphasis on interaction-rich scenarios could support development of safer prediction systems that handle complex multi-vehicle situations.
Load-bearing premise
The provided annotations are accurate enough and the selected scenarios are sufficiently representative to drive meaningful improvements in deployed self-driving systems.
What would settle it
A test showing that models trained on Argoverse 2 data achieve no measurable gains in accuracy or generalization on independent real-world self-driving benchmarks compared to models trained on smaller prior datasets would falsify the claim.
read the original abstract
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Argoverse 2, a collection of three datasets for self-driving perception and forecasting research. The Annotated Sensor Dataset provides 1,000 multimodal sequences with high-resolution imagery from seven ring cameras and two stereo cameras, lidar point clouds, 6-DOF map-aligned pose, and 3D cuboid annotations for 26 object categories. The Lidar Dataset offers 20,000 sequences of unlabeled lidar point clouds with map-aligned pose. The Motion Forecasting Dataset contains 250,000 scenarios with track histories for scored actors and HD maps with 3D lane and crosswalk geometry, all sourced from six cities. The authors state that these datasets will support new and existing machine learning research problems in ways that existing datasets do not, and all are released under CC BY-NC-SA 4.0.
Significance. If the datasets are released with the described scale, annotation quality, and diversity, this work provides a substantial resource for advancing 3D perception, self-supervised point cloud learning, point cloud forecasting, and motion prediction. Strengths include the explicit provision of annotation counts, category coverage, multi-city sourcing, and the combination of labeled and unlabeled data at large scale, which directly addresses limitations in prior collections and enables new research directions as claimed.
minor comments (3)
- [Abstract] Abstract: The statement that all 26 object categories 'are sufficiently-sampled to support training and evaluation of 3D perception models' would be strengthened by including (or referencing) per-category instance counts or a table summarizing annotation statistics to allow readers to assess this claim directly.
- The manuscript would benefit from a dedicated comparison section or table (e.g., Table 1) against prior datasets such as nuScenes, Waymo Open Dataset, and the original Argoverse to explicitly quantify differences in scale, number of categories, sensor modalities, and geographic coverage.
- [Abstract] Motion Forecasting Dataset description: The criteria used to mine the 250,000 scenarios for 'interesting and challenging interactions' are not detailed in the provided abstract; adding a brief description of the mining process or heuristics would improve reproducibility and clarity.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the Argoverse 2 datasets and for recommending acceptance. We appreciate the recognition that the scale, annotation quality, multi-city diversity, and combination of labeled and unlabeled data address limitations in prior collections and enable new research directions.
Circularity Check
No circularity: pure dataset release with no derivations or fitted predictions
full rationale
The paper is a data release describing three datasets (Sensor, Lidar, Motion Forecasting) with explicit counts, categories, sourcing, and annotation details. No equations, derivations, parameters, or predictive claims exist that could reduce to inputs by construction. The central assertion that the datasets enable new research rests on the documented scale, diversity, and annotations rather than any self-referential logic or self-citation chain. This is the standard non-circular structure for dataset papers.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 27 Pith papers
-
Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning
UniTopo unifies lane detection and topology reasoning into a single perception model, outperforming prior methods on OpenLane-V2 benchmarks with TOP_ll scores of 30.1% and 31.8%.
-
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
-
TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation
TRIP-Evaluate is a new open multimodal benchmark with 837 text, image, and point-cloud items organized by a role-task-knowledge taxonomy to evaluate large models on transportation workflows.
-
TopoHR: Hierarchical Centerline Representation for Cyclic Topology Reasoning in Driving Scenes with Point-to-Instance Relations
TopoHR proposes a hierarchical centerline representation and topology reasoning module with point-to-instance relations and cyclic interactions, achieving new state-of-the-art results on the OpenLane-V2 benchmark for ...
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction
ADM-GS decomposes static background appearance into traversal-invariant material and traversal-dependent illumination via a frequency-separated neural light field, yielding +0.98 dB PSNR gains and better cross-travers...
-
RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection
RayMamba improves long-range 3D object detection by ray-aligned serialization of sparse voxels for state space modeling, delivering up to 2.49 mAP gain on nuScenes in the 40-50 m range.
-
A global dataset of continuous urban dashcam driving
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
-
UniDAC: Universal Metric Depth Estimation for Any Camera
UniDAC achieves universal metric depth estimation across camera types by decoupling relative depth prediction from spatially varying scale estimation using a depth-guided module and distortion-aware positional embedding.
-
MUSDA: Multi-source Multi-modality Unsupervised Domain Adaptive 3D Object Detection for Autonomous Driving
MUSDA proposes hierarchical domain classifiers for multi-modality feature alignment and a prototype graph strategy for multi-source prediction fusion in unsupervised domain adaptation for 3D object detection.
-
GSMap: 2D Gaussians for Online HD Mapping
GSMap represents HD map elements as sequences of 2D Gaussians to unify geometric precision and topological regularity for online autonomous driving maps.
-
Unified Map Prior Encoder for Mapping and Planning
UMPE fuses any subset of HD/SD vector maps, raster SD maps, and satellite imagery into BEV features via alignment-aware vector and raster branches, raising mapping mAP by 5.3-5.9 points and cutting planning L2 error b...
-
LIE: LiDAR-only HD Map Construction with Intensity Enhancement via Online Knowledge Distillation
LIE delivers LiDAR-only HD map segmentation via online knowledge distillation that fuses intensity maps, beating the best camera-only model by 8.2% mIoU on nuScenes while adapting quickly to new datasets.
-
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
VLM-VPI uses Qwen3-VL and GPT-OSS models for pedestrian intent and age reasoning plus a tiered safety controller, reporting 92.3% intent accuracy in CARLA and reduced conflicts versus rule-based and supervised baselines.
-
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
-
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.
-
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
-
EdgeVTP: Exploration of Latency-efficient Trajectory Prediction for Edge-based Embedded Vision Applications
EdgeVTP delivers the lowest measured end-to-end latency on Jetson-class platforms while matching or exceeding state-of-the-art accuracy on highway trajectory benchmarks by using bounded graph interactions and a one-sh...
-
EagleVision: A Multi-Task Benchmark for Cross-Domain Perception in High-Speed Autonomous Racing
EagleVision creates a standardized multi-task benchmark for LiDAR perception in high-speed autonomous racing, with experiments showing that pretraining on racing data improves cross-domain detection and prediction per...
-
Visually-grounded Humanoid Agents
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
-
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
-
HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes
HorizonWeaver enables photorealistic, instruction-driven multi-level editing of complex driving scenes with improved generalization via a new paired dataset, language-guided masks, and joint training losses.
-
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
-
Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic
This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.
-
LEAN-3D: Low-latency Hierarchical Point Cloud Codec for Mobile 3D Streaming
LEAN-3D delivers 3-5x lower latency and up to 5.1x lower edge energy for learned point cloud compression on mobile hardware by restricting learned components to shallow hierarchy levels and using deterministic coding ...
-
AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation
AtteConDA adds attention-based conflict suppression to multi-condition diffusion models so that generated driving-scene images retain richer structural cues from the original annotations.
Reference graph
Works this paper leans on
-
[1]
SemanticKITTI: A dataset for semantic scene understanding of lidar sequences
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In ICCV, October 2019
work page 2019
-
[2]
Range conditioned dilated convolutions for scale invariant 3d object detection
Alex Bewley, Pei Sun, Thomas Mensink, Drago Anguelov, and Cristian Sminchisescu. Range conditioned dilated convolutions for scale invariant 3d object detection. In Conference on Robot Learning, 2020
work page 2020
-
[3]
Language Models are Few-Shot Learners
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[4]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A Multimodal Dataset for Autonomous Driving. In CVPR, 2020
work page 2020
-
[5]
To the point: Efficient 3d object detection in the range image with graph convolution kernels
Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, and Dragomir Anguelov. To the point: Efficient 3d object detection in the range image with graph convolution kernels. In CVPR, June 2021
work page 2021
-
[6]
Argoverse: 3D Tracking and Forecasting With Rich Maps
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argoverse: 3D Tracking and Forecasting With Rich Maps. In CVPR, 2019
work page 2019
-
[7]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020
work page 2020
-
[8]
GeoSim: Realistic video simulation via geometry-aware composition for self-driving
Yun Chen, Frieda Rong, Shivam Duggal, Shenlong Wang, Xinchen Yan, Sivabalan Mani- vasagam, Shangjie Xue, Ersin Yumer, and Raquel Urtasun. GeoSim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, June 2021
work page 2021
-
[9]
Kaist multi-spectral day/night data set for autonomous and assisted driving
Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Jae Shin Yoon, Kyounghwan An, and In So Kweon. Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3):934–948, 2018. 11
work page 2018
-
[10]
All-day visual place recognition: Benchmark dataset and baseline
Yukyung Choi, Namil Kim, Kibaek Park, Soonmin Hwang, Jae Shin Yoon, Yoon In, and Inso Kweon. All-day visual place recognition: Benchmark dataset and baseline. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Workshop on Visual Place Recognition in Changing Environments, 2015
work page 2015
-
[11]
Gorela: Go relative for viewpoint-invariant motion forecasting
Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. arXiv preprint arXiv:2211.02545, 2022
-
[12]
Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset
Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving : The waymo open motion da...
-
[13]
Jin Fang, Qinghao Meng, Dingfu Zhou, Chulin Tang, Jianbing Shen, Cheng-Zhong Xu, and Liangjun Zhang. Technical report for cvpr 2022 workshop on autonomous driving argoverse 3d object detection competition, 2022
work page 2022
-
[14]
Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection
Nils Gählert, Nicolas Jourdan, Marius Cordts, Uwe Franke, and Joachim Denzler. Cityscapes 3d: Dataset and benchmark for 9 dof vehicle detection. CoRR, abs/2006.07864, 2020
-
[15]
VectorNet: Encoding hd maps and agent dynamics from vectorized representation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. VectorNet: Encoding hd maps and agent dynamics from vectorized representation. In CVPR, June 2020
work page 2020
-
[16]
Afdet: Anchor free one stage 3d object detection
Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Yu Wang, Sijia Chen, Li Huang, and Yuan Li. Afdet: Anchor free one stage 3d object detection. In CVPR Workshops, 2020
work page 2020
-
[17]
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR, June 2012
work page 2012
-
[18]
Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S. Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mühlegg, Sebastian Dorn, Tiffany Fernandez, Martin Jänicke, Sudesh Mirashi, Chiragkumar Savani, Martin Sturm, Oleksandr V orobiov, Martin Oelker, Sebastian Garreis, and Peter Schuberth. A2D2: audi autonomous driving d...
-
[19]
Home: Heatmap output for future motion estimation
Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Home: Heatmap output for future motion estimation. arXiv preprint arXiv:2105.10968, 2021
-
[20]
Thomas: Trajectory heatmap output with learned multi-agent sampling
Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. In ICLR, 2022
work page 2022
-
[21]
Streaming object detection for 3-d point clouds
Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, and Zhifeng Chen. Streaming object detection for 3-d point clouds. In ECCV, 2020
work page 2020
-
[22]
One Thousand and One Hours: Self-driving Motion Prediction Dataset
John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, and Peter Ondruska. One Thousand and One Hours: Self-driving Motion Prediction Dataset. arXiv:2006.14480 [cs], November 2020. Comment: Presented at CoRL2020
-
[23]
Safe local motion planning with self-supervised freespace forecasting
Peiyun Hu, Aaron Huang, John Dolan, David Held, and Deva Ramanan. Safe local motion planning with self-supervised freespace forecasting. In CVPR, June 2021
work page 2021
- [24]
-
[25]
What-if motion prediction for autonomous driving
Siddhesh Khandelwal, William Qi, Jagjeet Singh, Andrew Hartnett, and Deva Ramanan. What-if motion prediction for autonomous driving. arXiv preprint arXiv:2008.10587, 2020
-
[26]
John W. Lambert and James Hays. Trust, but Verify: Cross-modality fusion for hd map change detection. In Advances in Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
work page 2021
-
[27]
Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom
Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. PointPillars: Fast encoders for object detection from point clouds. In CVPR, June 2019
work page 2019
-
[28]
Social attention for autonomous decision-making in dense traffic
Edouard Leurent and Jean Mercat. Social attention for autonomous decision-making in dense traffic. CoRR, abs/1911.12250, 2019. 12
-
[29]
Mengtian Li, Yu-Xiong Wang, and Deva Ramanan. Towards streaming perception. InECCV, 2020
work page 2020
-
[30]
Hdmapnet: An online HD map construction and evaluation framework
Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online HD map construction and evaluation framework. CoRR, abs/2107.06307, 2021
-
[31]
Learning lane graph representations for motion forecasting
Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, and Raquel Urtasun. Learning lane graph representations for motion forecasting. In ECCV, 2020
work page 2020
-
[32]
Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation
Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542, 2022
-
[33]
Trafficpredict: Trajectory prediction for heterogeneous traffic-agents.CoRR, abs/1811.02146, 2018
Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. Trafficpredict: Trajectory prediction for heterogeneous traffic-agents.CoRR, abs/1811.02146, 2018
-
[34]
Andrey Malinin, Neil Band, Alexander Ganshin, German Chesnokov, Yarin Gal, Mark J. F. Gales, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Mariya Shmatova, Panos Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. CoRR, abs/2107.07455, 2021
-
[35]
LiDARsim: Realistic lidar simula- tion by leveraging the real world
Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, and Raquel Urtasun. LiDARsim: Realistic lidar simula- tion by leveraging the real world. In CVPR, June 2020
work page 2020
-
[36]
One million scenes for aut onomous driving: Once dataset
Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Hang Xu, and Chunjing Xu. One Million Scenes for Autonomous Driving: ONCE Dataset. arXiv:2106.11037 [cs], August 2021. Comment: Accepted to NeurIPS 2021 Datasets and Benchmarks Track
-
[37]
Multi-head attention for multi-modal joint vehicle motion forecasting
Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting. In ICRA. IEEE, 2020
work page 2020
-
[38]
Multi-head attention for multi-modal joint vehicle motion forecasting, 2019
Jean Mercat, Thomas Gilles, Nicole El Zoghby, Guillaume Sandou, Dominique Beauvois, and Guillermo Pita Gil. Multi-head attention for multi-modal joint vehicle motion forecasting, 2019
work page 2019
-
[39]
The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes
Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The H3D dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. CoRR, abs/1903.01568, 2019
-
[40]
A*3d dataset: Towards au- tonomous driving in challenging environments
Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, and Jie Lin. A*3d dataset: Towards au- tonomous driving in challenging environments. CoRR, abs/1909.07541, 2019
-
[41]
Canadian adverse driving conditions dataset
Matthew Pitropov, Danson Evan Garcia, Jason Rebello, Michael Smart, Carlos Wang, Krzysztof Czarnecki, and Steven Waslander. Canadian adverse driving conditions dataset. The Interna- tional Journal of Robotics Research, 40(4-5):681–690, Dec 2020
work page 2020
-
[42]
Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov
Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa V o, Boyang Deng, and Dragomir Anguelov. Offboard 3d object detection from point cloud sequences. In CVPR, June 2021
work page 2021
-
[43]
Argoverse motion forecast- ing competition
Jagjeet Singh, William Qi, Tanmay Agarwal, and Andrew Hartnett. Argoverse motion forecast- ing competition. https://eval.ai/web/challenges/challenge-page/454/overview. Accessed: 08-27-2021
work page 2021
-
[44]
Qml for argoverse 2 motion forecasting challenge, 2022
Tong Su, Xishun Wang, and Xiaodong Yang. Qml for argoverse 2 motion forecasting challenge, 2022
work page 2022
-
[45]
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in Perception...
work page 2020
-
[46]
Rsn: Range sparse net for efficient, accurate lidar 3d object detection
Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, and Dragomir Anguelov. Rsn: Range sparse net for efficient, accurate lidar 3d object detection. In CVPR, June 2021. 13
work page 2021
-
[47]
Ganet: Goal area network for motion forecasting, 2022
Mingkun Wang, Xinge Zhu, Changqian Yu, Wei Li, Yuexin Ma, Ruochun Jin, Xiaoguang Ren, Dongchun Ren, Mingxu Wang, and Wenjing Yang. Ganet: Goal area network for motion forecasting, 2022
work page 2022
-
[48]
4d forecast- ing: Sequential forecasting of 100,000 points
Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nicholas Rhinehart. 4d forecast- ing: Sequential forecasting of 100,000 points. In Proceedings of ECCV ’20 Workshops, August 2020
work page 2020
-
[49]
Xinshuo Weng, Jianren Wang, Sergey Levine, Kris Kitani, and Nick Rhinehart. Inverting the forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting. In Proceedings of (CoRL) Conference on Robot Learning, November 2020
work page 2020
-
[50]
Surfelgan: Synthesizing realistic sensor data for autonomous driving
Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, and Henrik Kretzschmar. Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, June 2020
work page 2020
-
[51]
Center-based 3d object detection and tracking
Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center-based 3d object detection and tracking. In CVPR, June 2021
work page 2021
-
[52]
Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius Kummerle, Hendrik Konigshof, Christoph Stiller, Arnaud de La Fortelle, et al. Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088, 2019
-
[53]
Banet: Motion forecasting with boundary aware network, 2022
Chen Zhang, Honglin Sun, Chen Chen, and Yandong Guo. Banet: Motion forecasting with boundary aware network, 2022
work page 2022
-
[54]
Jannik Zürn, Johan Vertens, and Wolfram Burgard. Lane graph estimation for scene understand- ing in urban driving. CoRR, abs/2105.00195, 2021. 6 Appendix 6.1 Additional Information About Sensor Suite In Figure 8, we provide a diagram of the sensor suite used to capture the Argoverse 2 datasets. Figure 9 shows the speed distribution for annotated pedestria...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.