FocalAD: Local Motion Planning for End-to-End Autonomous Driving
Pith reviewed 2026-05-19 10:11 UTC · model grok-4.3
The pith
FocalAD improves end-to-end driving by focusing planning on critical local agent interactions rather than global scene features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FocalAD claims that an end-to-end driving model achieves more reliable planning when it augments motion representations through an ego-centric graph interactor that models dynamics with nearby agents and a loss term that raises the training weight of those agents most relevant to the ego plan, resulting in stronger open-loop and closed-loop performance than prior methods and especially large collision reductions on adversarial variants of nuScenes.
What carries the argument
The Ego-Local-Agents Interactor that builds a graph-based representation of motion dynamics between the ego vehicle and its immediate neighbors, together with the Focal-Local-Agents Loss that assigns higher importance to decision-critical agents during training.
Load-bearing premise
Planning decisions are shaped primarily by a small number of nearby interacting agents whose effects can be captured adequately by an ego-centered graph.
What would settle it
A controlled experiment on a new test set containing many equally influential distant agents where FocalAD shows no advantage over global-feature baselines would indicate that local focus is not the decisive factor.
read the original abstract
In end-to-end autonomous driving,the motion prediction plays a pivotal role in ego-vehicle planning. However, existing methods often rely on globally aggregated motion features, ignoring the fact that planning decisions are primarily influenced by a small number of locally interacting agents. Failing to attend to these critical local interactions can obscure potential risks and undermine planning reliability. In this work, we propose FocalAD, a novel end-to-end autonomous driving framework that focuses on critical local neighbors and refines planning by enhancing local motion representations. Specifically, FocalAD comprises two core modules: the Ego-Local-Agents Interactor (ELAI) and the Focal-Local-Agents Loss (FLA Loss). ELAI conducts a graph-based ego-centric interaction representation that captures motion dynamics with local neighbors to enhance both ego planning and agent motion queries. FLA Loss increases the weights of decision-critical neighboring agents, guiding the model to prioritize those more relevant to planning. Extensive experiments show that FocalAD outperforms existing state-of-the-art methods on the open-loop nuScenes datasets and closed-loop Bench2Drive benchmark. Notably, on the robustness-focused Adv-nuScenes dataset, FocalAD achieves even greater improvements, reducing the average colilision rate by 41.9% compared to DiffusionDrive and by 15.6% compared to SparseDrive.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FocalAD, an end-to-end autonomous driving framework that focuses on critical local neighbors rather than global motion features. It proposes the Ego-Local-Agents Interactor (ELAI) for graph-based ego-centric interaction representations and the Focal-Local-Agents Loss (FLA Loss) to upweight decision-critical agents. The central claim is that this local-motion emphasis yields state-of-the-art results on open-loop nuScenes, closed-loop Bench2Drive, and especially the robustness-focused Adv-nuScenes dataset, with reported collision-rate reductions of 41.9% versus DiffusionDrive and 15.6% versus SparseDrive.
Significance. If the attribution to local focus holds, the work would be significant for shifting end-to-end driving research toward ego-centric local representations that better capture interaction risks. The evaluations span open- and closed-loop settings plus an adversarial benchmark, providing a reasonably broad empirical basis. Explicit credit is due for including a robustness dataset that highlights safety-relevant gains.
major comments (3)
- [Abstract / Experimental Results] Abstract and Experimental Results: the reported gains (e.g., 41.9% average collision-rate reduction on Adv-nuScenes) are given without error bars, variance estimates, or statistical significance tests, so the reliability of the outperformance claim cannot be assessed from the presented data.
- [ELAI Module] ELAI description: the mechanism for selecting the local neighbors (radius or count) that enter the graph-based ego-centric representation is not specified, even though this choice is a free parameter that directly determines what counts as 'local' and therefore underpins the central premise.
- [Experiments] Experiments: no controlled ablations isolate the contribution of ELAI or FLA Loss while holding model capacity, training recipe, and feature extractors fixed; without them the attribution of SOTA gains to the local-interaction mechanisms remains unverified and the central claim is load-bearing on untested design choices.
minor comments (2)
- [Abstract] Abstract: 'colilision rate' is a typographical error and should read 'collision rate'.
- [Abstract] Abstract: the opening sentence equates motion prediction with planning; a brief clarification of how the two are distinguished in the proposed pipeline would improve precision.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work. We provide point-by-point responses to the major comments and outline the changes we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract / Experimental Results] Abstract and Experimental Results: the reported gains (e.g., 41.9% average collision-rate reduction on Adv-nuScenes) are given without error bars, variance estimates, or statistical significance tests, so the reliability of the outperformance claim cannot be assessed from the presented data.
Authors: We agree that error bars, variance estimates, and statistical significance tests would strengthen the assessment of our results. In the revised manuscript we will report standard deviations computed over multiple random seeds for the key metrics on Adv-nuScenes and include appropriate statistical tests comparing FocalAD against the strongest baselines. revision: yes
-
Referee: [ELAI Module] ELAI description: the mechanism for selecting the local neighbors (radius or count) that enter the graph-based ego-centric representation is not specified, even though this choice is a free parameter that directly determines what counts as 'local' and therefore underpins the central premise.
Authors: We acknowledge that the neighbor-selection procedure in ELAI should be stated more explicitly. In the revised version we will add a precise description of the selection rule (distance radius combined with a maximum agent count) together with the concrete hyper-parameter values used in all reported experiments and a brief discussion of their effect on the local-interaction premise. revision: yes
-
Referee: [Experiments] Experiments: no controlled ablations isolate the contribution of ELAI or FLA Loss while holding model capacity, training recipe, and feature extractors fixed; without them the attribution of SOTA gains to the local-interaction mechanisms remains unverified and the central claim is load-bearing on untested design choices.
Authors: We recognize the value of controlled ablations that isolate each proposed component. While the current experiments contain baseline comparisons and partial component studies, we will add new ablation tables in the revised manuscript that incrementally enable ELAI and FLA Loss on top of an otherwise identical backbone, keeping model capacity, training schedule, and feature extractors fixed. revision: yes
Circularity Check
No circularity: empirical claims rest on external benchmarks
full rationale
The manuscript introduces FocalAD with ELAI graph module and FLA Loss weighting, then reports open-loop and closed-loop metrics on fixed public datasets (nuScenes, Adv-nuScenes, Bench2Drive) against independent prior baselines. No equations, loss terms, or predictions are shown to be algebraically identical to fitted inputs or to prior self-citations; the central performance numbers are direct experimental outputs, not quantities defined by construction inside the paper. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- neighbor selection radius or count
- FLA Loss weighting coefficients
axioms (1)
- domain assumption Planning decisions are primarily influenced by a small number of locally interacting agents
Forward citations
Cited by 1 Pith paper
-
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of 13 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W.,et al.: Planning-oriented autonomous driving. In: Proceedings of 13 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17853–17862 (2023)
work page 2023
-
[2]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang, C., Wang, X.: Vad: Vectorized scene representation for efficient autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8340–8350 (2023)
work page 2023
-
[3]
arXiv preprint arXiv:2405.19620 2405.19620
Sun, W., Lin, X., Shi, Y., Zhang, C., Wu, H., Zheng, S.: SparseDrive: End- to-End Autonomous Driving via Sparse Scene Representation. arXiv preprint arXiv:2405.19620 2405.19620
-
[4]
Diffusiondrive: Trun- 9 cated diffusion model for end-to-end autonomous driving
Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. arXiv preprint arXiv:2411.15139 (2024)
-
[5]
Xing, Z., Zhang, X., Hu, Y., Jiang, B., He, T., Zhang, Q., Long, X., Yin, W.: Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. arXiv preprint arXiv:2503.05689 (2025)
-
[6]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
Chen, S., Jiang, B., Gao, H., Liao, B., Xu, Q., Zhang, Q., Huang, C., Liu, W., Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
arXiv preprint arXiv:2403.19098 (2024)
Zhang, Y., Qian, D., Li, D., Pan, Y., Chen, Y., Liang, Z., Zhang, Z., Zhang, S., Li, H., Fu, M., et al.: Graphad: Interaction scene graph for end-to-end autonomous driving. arXiv preprint arXiv:2403.19098 (2024)
-
[8]
arXiv preprint arXiv:2503.14182 (2025)
Zhang, B., Song, N., Jin, X., Zhang, L.: Bridging past and future: End-to- end autonomous driving with historical prediction and planning. arXiv preprint arXiv:2503.14182 (2025)
-
[9]
Song, Z., Jia, C., Liu, L., Pan, H., Zhang, Y., Wang, J., Zhang, X., Xu, S., Yang, L., Luo, Y.: Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. arXiv preprint arXiv:2503.03125 (2025)
-
[10]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)
Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)
work page 2024
-
[11]
IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2023)
Chib, P.S., Singh, P.: Recent advancements in end-to-end autonomous driving using deep learning: A survey. IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2023)
work page 2023
-
[12]
Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., Yu, Q., Dai, J.: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotem- poral transformers.(2022). URL https://arxiv. org/abs/2203.17270 (2022) 14
-
[13]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Wang, S., Liu, Y., Wang, T., Li, Y., Zhang, X.: Exploring object-centric tempo- ral modeling for efficient multi-view 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3621–3631 (2023)
work page 2023
-
[14]
arXiv preprint arXiv:2311.11722 (2023)
Lin, X., Pei, Z., Lin, T., Huang, L., Su, Z.: Sparse4d v3: Advancing end-to-end 3d detection and tracking. arXiv preprint arXiv:2311.11722 (2023)
-
[15]
Advances in Neural Information Processing Systems35, 6531–6543 (2022)
Shi, S., Jiang, L., Dai, D., Schiele, B.: Motion transformer with global inten- tion localization and local movement refinement. Advances in Neural Information Processing Systems35, 6531–6543 (2022)
work page 2022
-
[16]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: Multi- object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
work page 2022
-
[17]
In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp
Cheng, J., Chen, Y., Mei, X., Yang, B., Li, B., Liu, M.: Rethinking imitation- based planners for autonomous driving. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 14123–14130 (2024). IEEE
work page 2024
-
[18]
Baidu Apollo EM Motion Planner
Fan, H., Zhu, F., Liu, C., Zhang, L., Zhuang, L., Li, D., Zhu, W., Hu, J., Li, H., Kong, Q.: Baidu apollo em motion planner. arXiv preprint arXiv:1807.08048 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Jia, X., Wu, P., Chen, L., Xie, J., He, C., Yan, J., Li, H.: Think twice before driving: Towards scalable decoders for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21983–21994 (2023)
work page 2023
-
[20]
arXiv preprint arXiv:2308.01006 (2023)
Ye, T., Jing, W., Hu, C., Huang, S., Gao, L., Li, F., Wang, J., Guo, K., Xiao, W., Mao, W., et al.: Fusionad: Multi-modality fusion for prediction and planning tasks of autonomous driving. arXiv preprint arXiv:2308.01006 (2023)
-
[21]
In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp
Fu, J., Shen, Y., Jian, Z., Chen, S., Xin, J., Zheng, N.: Interactionnet: Joint planning and prediction for autonomous driving with transformers. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9332–9339 (2023). IEEE
work page 2023
-
[22]
In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp
Chen, J., Xu, Z., Tomizuka, M.: End-to-end autonomous driving perception with sequential latent representation learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1999–2006 (2020). IEEE
work page 2020
-
[23]
IEEE Transactions on Intelligent Vehicles8(1), 673–683 (2022)
Teng, S., Chen, L., Ai, Y., Zhou, Y., Xuanyuan, Z., Hu, X.: Hierarchi- cal interpretable imitation learning for end-to-end autonomous driving. IEEE Transactions on Intelligent Vehicles8(1), 673–683 (2022)
work page 2022
-
[24]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Jia, X., Gao, Y., Chen, L., Yan, J., Liu, P.L., Li, H.: Driveadapter: Breaking the 15 coupling barrier of perception and planning in end-to-end autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7953–7963 (2023)
work page 2023
-
[25]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Li, Z., Wang, W., Xie, E., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., Lu, T.: Panoptic segformer: Delving deeper into panoptic segmentation with trans- formers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1280–1289 (2022)
work page 2022
-
[26]
In: European Conference on Computer Vision, pp
Chen, Z., Ye, M., Xu, S., Cao, T., Chen, Q.: Ppad: Iterative interactions of predic- tion and planning for end-to-end autonomous driving. In: European Conference on Computer Vision, pp. 239–256 (2024). Springer
work page 2024
-
[27]
In: European Conference on Computer Vision, pp
Zheng, W., Song, R., Guo, X., Zhang, C., Chen, L.: Genad: Generative end- to-end autonomous driving. In: European Conference on Computer Vision, pp. 87–104 (2024). Springer
work page 2024
-
[28]
arXiv preprint arXiv:2503.08162 (2025)
Qian, K., Luo, Z., Jiang, S., Huang, Z., Miao, J., Ma, Z., Zhu, T., Li, J., He, Y., Fu, Z., et al.: Fasionad++: Integrating high-level instruction and information bottleneck in fat-slow fusion systems for enhanced safety in autonomous driving with adaptive feedback. arXiv preprint arXiv:2503.08162 (2025)
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krish- nan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
work page 2020
-
[30]
Jia, X., Yang, Z., Li, Q., Zhang, Z., Yan, J.: Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving. arXiv preprint arXiv:2406.03877 (2024)
-
[31]
In: Conference on Robot Learning, pp
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017). PMLR
work page 2017
-
[32]
arXiv preprint arXiv:2505.15880 (2025)
Xu, Z., Li, B., Gao, H.-a., Gao, M., Chen, Y., Liu, M., Yan, C., Zhao, H., Feng, S., Zhao, H.: Challenger: Affordable adversarial driving video generation. arXiv preprint arXiv:2505.15880 (2025)
-
[33]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
work page 2016
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Gu, J., Hu, C., Zhang, T., Chen, X., Wang, Y., Wang, Y., Zhao, H.: Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5496–5506 (2023) 16
work page 2023
-
[35]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liang, M., Yang, B., Zeng, W., Chen, Y., Hu, R., Casas, S., Urtasun, R.: Pnpnet: End-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11553–11562 (2020)
work page 2020
-
[36]
In: European Conference on Computer Vision, pp
Hu, S., Chen, L., Wu, P., Li, H., Yan, J., Tao, D.: St-p3: End-to-end vision- based autonomous driving via spatial-temporal feature learning. In: European Conference on Computer Vision, pp. 533–549 (2022). Springer
work page 2022
-
[37]
: Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking
Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al. : Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking. Advances in Neural Information Processing Systems37, 28706–28719 (2024) 17
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.