hub

arXiv preprint arXiv:2112.11790 (2021)

Junjie Huang, Guan Huang, Zheng Zhu, Ye Yun, Dalong Du · 2021 · arXiv 2112.11790

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving

cs.CR · 2026-05-12 · unverdicted · novelty 8.0

Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

EgoEV-HandPose uses stereo event cameras and a bird's-eye-view fusion module to achieve 30.54 mm MPJPE and 86.87% gesture accuracy on a new large-scale egocentric dataset, outperforming prior RGB and event methods especially in low light and occlusion.

PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.

SoK: The Next Frontier in AV Security: Systematizing Perception Attacks and the Emerging Threat of Multi-Sensor Fusion

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

The paper organizes perception attacks on AVs into a new taxonomy, identifies gaps in fusion-aware defenses, and validates one cross-sensor vulnerability with a proof-of-concept simulation.

Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

Dynamic token selection and training only 1.6 million parameters instead of over 300 million reduces computation by 48-55% and improves accuracy over prior state-of-the-art on the NuScenes dataset.

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.

Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

cs.CV · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.

SimPB++: Simultaneously Detecting 2D and 3D Objects from Multiple Cameras

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.

ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.

DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.

InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making

cs.CV · 2026-05-05 · unverdicted · novelty 5.0

Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.

SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.

Radar-Camera BEV Multi-Task Learning with Cross-Task Attention Bridge for Joint 3D Detection and Segmentation

cs.CV · 2026-04-14 · unverdicted · novelty 5.0

CTAB exchanges features between detection and segmentation via multi-scale deformable attention in BEV space, yielding segmentation gains on 7 nuScenes classes at neutral detection cost.

Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.

Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving

cs.CV · 2026-04-06 · unverdicted · novelty 5.0

MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.

BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving

cs.CV · 2026-04-03 · unverdicted · novelty 5.0

BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.

citing papers explorer

Showing 18 of 18 citing papers.

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving cs.CR · 2026-05-12 · unverdicted · none · ref 17
Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.
EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras cs.CV · 2026-05-12 · unverdicted · none · ref 46
EgoEV-HandPose uses stereo event cameras and a bird's-eye-view fusion module to achieve 30.54 mm MPJPE and 86.87% gesture accuracy on a new large-scale egocentric dataset, outperforming prior RGB and event methods especially in low light and occlusion.
PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations cs.CV · 2026-05-12 · unverdicted · none · ref 10
PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.
SoK: The Next Frontier in AV Security: Systematizing Perception Attacks and the Emerging Threat of Multi-Sensor Fusion cs.CR · 2026-04-22 · unverdicted · none · ref 38
The paper organizes perception attacks on AVs into a new taxonomy, identifies gaps in fusion-aware defenses, and validates one cross-sensor vulnerability with a proof-of-concept simulation.
Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning cs.CV · 2026-04-15 · unverdicted · none · ref 8
Dynamic token selection and training only 1.6 million parameters instead of over 300 million reduces computation by 48-55% and improves accuracy over prior state-of-the-art on the NuScenes dataset.
DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather cs.CV · 2026-04-09 · unverdicted · none · ref 13
DinoRADE reports a radar-centered multi-class detection pipeline that fuses dense radar tensors with DINOv3 features via deformable attention and outperforms prior radar-camera methods by 12.1% on the K-Radar dataset across weather conditions.
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy cs.CV · 2026-05-06 · unverdicted · none · ref 15 · 2 links
HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.
SimPB++: Simultaneously Detecting 2D and 3D Objects from Multiple Cameras cs.CV · 2026-05-03 · unverdicted · none · ref 33
SimPB++ unifies multi-view 2D perspective and 3D BEV object detection in one model via an interactive hybrid decoder, reporting state-of-the-art results on nuScenes and long-range detection up to 150 m on Argoverse2.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 21
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras cs.CV · 2026-04-18 · unverdicted · none · ref 5
CAM3DNet outperforms prior camera-based 3D detectors on nuScenes, Waymo and Argoverse by using three new modules to better mine multi-scale spatiotemporal features from 2D queries and pyramid maps.
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation cs.CV · 2026-04-15 · unverdicted · none · ref 10
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale cs.CV · 2026-04-01 · unverdicted · none · ref 18
DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.
InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making cs.CV · 2026-05-05 · unverdicted · none · ref 26
Integrating DVS event data into InterFuser through token fusion yields a driving score of 77.2 and 100% route completion on CARLA benchmarks, indicating improved robustness in dynamic conditions.
SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection cs.CV · 2026-04-20 · unverdicted · none · ref 15
SemLT3D introduces semantic-guided expert distillation with a language MoE module and CLIP projection to enrich features for long-tailed classes in camera-only 3D detection.
Radar-Camera BEV Multi-Task Learning with Cross-Task Attention Bridge for Joint 3D Detection and Segmentation cs.CV · 2026-04-14 · unverdicted · none · ref 4
CTAB exchanges features between detection and segmentation via multi-scale deformable attention in BEV space, yielding segmentation gains on 7 nuScenes classes at neutral detection cost.
Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning cs.CV · 2026-04-07 · unverdicted · none · ref 10
GameAD models autonomous driving as a risk-prioritized game among agents via Risk-Aware Topology Anchoring, Minimax Risk-Aware Sparse Attention and related components, yielding safer trajectories than prior end-to-end methods on nuScenes and Bench2Drive.
Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving cs.CV · 2026-04-06 · unverdicted · none · ref 8
MMF-BEV fuses camera and radar branches with deformable self- and cross-attention, outperforming unimodal baselines on the VoD 4D radar dataset through a two-stage training process.
BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving cs.CV · 2026-04-03 · unverdicted · none · ref 10
BEVPredFormer uses attention-based temporal processing and 3D camera projection to match or exceed prior methods on nuScenes for BEV instance prediction.

arXiv preprint arXiv:2112.11790 (2021)

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer