Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.
hub Canonical reference
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
Canonical reference. 71% of citing Pith papers cite this work as background.
abstract
Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.
EgoEV-HandPose uses stereo event cameras and a bird's-eye-view fusion module to achieve 30.54 mm MPJPE and 86.87% gesture accuracy on a new large-scale egocentric dataset, outperforming prior RGB and event methods especially in low light and occlusion.
PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.
The paper organizes perception attacks on AVs into a new taxonomy, identifies gaps in fusion-aware defenses, and validates one cross-sensor vulnerability with a proof-of-concept simulation.
Dynamic token selection and training only 1.6 million parameters instead of over 300 million reduces computation by 48-55% and improves accuracy over prior state-of-the-art on the NuScenes dataset.
DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.
TopoMaskV3 adds dense offset and height heads to produce standalone 3D road centerlines from masks and reports 28.5 OLS on a new geographically disjoint long-range benchmark.
HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
R4Det fuses 4D radar and camera inputs via panoramic depth fusion, deformable gated temporal fusion without ego pose, and instance-guided refinement to reach state-of-the-art 3D detection on TJ4DRadSet and VoD.
TFusionOcc uses a family of Student's t-distribution T-primitives and a T-mixture model for multi-sensor 3D occupancy prediction, reporting state-of-the-art results on nuScenes.
DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
DriveVLA-W0 adds world modeling to predict future images in VLA models, overcoming sparse action supervision and amplifying data scaling laws on NAVSIM benchmarks and a large in-house dataset.
BePo proposes a dual BEV and sparse-points representation with cross-attention fusion for more accurate and efficient 3D occupancy prediction on autonomous driving benchmarks.
RQR3D reparametrizes oriented bounding box regression in BEV 3D detection as regressing a horizontal box plus corner offsets and achieves SOTA camera-radar performance on nuScenes with 67.5 NDS and 59.7 mAP.
VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.
citing papers explorer
-
Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations
Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.