pith. sign in

arxiv: 2506.11419 · v1 · submitted 2025-06-13 · 💻 cs.AI · cs.RO

FocalAD: Local Motion Planning for End-to-End Autonomous Driving

Pith reviewed 2026-05-19 10:11 UTC · model grok-4.3

classification 💻 cs.AI cs.RO
keywords autonomous drivingend-to-end planninglocal motionagent interactiongraph representationcollision avoidancenuScenesrobustness testing
0
0 comments X

The pith

FocalAD improves end-to-end driving by focusing planning on critical local agent interactions rather than global scene features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

End-to-end autonomous driving requires accurate motion prediction to guide the ego vehicle safely. Existing approaches often aggregate features from all agents across the scene, which can overlook the close-range interactions that most directly affect planning choices. FocalAD counters this by constructing a graph of interactions centered on the ego vehicle and applying extra emphasis in training to the neighbors that matter most for decisions. A reader would care because clearer attention to these local risks could translate into fewer planning errors when the vehicle faces dense or unpredictable traffic. The reported experiments indicate that this shift produces measurable gains on both standard and robustness-oriented test sets.

Core claim

FocalAD claims that an end-to-end driving model achieves more reliable planning when it augments motion representations through an ego-centric graph interactor that models dynamics with nearby agents and a loss term that raises the training weight of those agents most relevant to the ego plan, resulting in stronger open-loop and closed-loop performance than prior methods and especially large collision reductions on adversarial variants of nuScenes.

What carries the argument

The Ego-Local-Agents Interactor that builds a graph-based representation of motion dynamics between the ego vehicle and its immediate neighbors, together with the Focal-Local-Agents Loss that assigns higher importance to decision-critical agents during training.

Load-bearing premise

Planning decisions are shaped primarily by a small number of nearby interacting agents whose effects can be captured adequately by an ego-centered graph.

What would settle it

A controlled experiment on a new test set containing many equally influential distant agents where FocalAD shows no advantage over global-feature baselines would indicate that local focus is not the decisive factor.

read the original abstract

In end-to-end autonomous driving,the motion prediction plays a pivotal role in ego-vehicle planning. However, existing methods often rely on globally aggregated motion features, ignoring the fact that planning decisions are primarily influenced by a small number of locally interacting agents. Failing to attend to these critical local interactions can obscure potential risks and undermine planning reliability. In this work, we propose FocalAD, a novel end-to-end autonomous driving framework that focuses on critical local neighbors and refines planning by enhancing local motion representations. Specifically, FocalAD comprises two core modules: the Ego-Local-Agents Interactor (ELAI) and the Focal-Local-Agents Loss (FLA Loss). ELAI conducts a graph-based ego-centric interaction representation that captures motion dynamics with local neighbors to enhance both ego planning and agent motion queries. FLA Loss increases the weights of decision-critical neighboring agents, guiding the model to prioritize those more relevant to planning. Extensive experiments show that FocalAD outperforms existing state-of-the-art methods on the open-loop nuScenes datasets and closed-loop Bench2Drive benchmark. Notably, on the robustness-focused Adv-nuScenes dataset, FocalAD achieves even greater improvements, reducing the average colilision rate by 41.9% compared to DiffusionDrive and by 15.6% compared to SparseDrive.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces FocalAD, an end-to-end autonomous driving framework that focuses on critical local neighbors rather than global motion features. It proposes the Ego-Local-Agents Interactor (ELAI) for graph-based ego-centric interaction representations and the Focal-Local-Agents Loss (FLA Loss) to upweight decision-critical agents. The central claim is that this local-motion emphasis yields state-of-the-art results on open-loop nuScenes, closed-loop Bench2Drive, and especially the robustness-focused Adv-nuScenes dataset, with reported collision-rate reductions of 41.9% versus DiffusionDrive and 15.6% versus SparseDrive.

Significance. If the attribution to local focus holds, the work would be significant for shifting end-to-end driving research toward ego-centric local representations that better capture interaction risks. The evaluations span open- and closed-loop settings plus an adversarial benchmark, providing a reasonably broad empirical basis. Explicit credit is due for including a robustness dataset that highlights safety-relevant gains.

major comments (3)
  1. [Abstract / Experimental Results] Abstract and Experimental Results: the reported gains (e.g., 41.9% average collision-rate reduction on Adv-nuScenes) are given without error bars, variance estimates, or statistical significance tests, so the reliability of the outperformance claim cannot be assessed from the presented data.
  2. [ELAI Module] ELAI description: the mechanism for selecting the local neighbors (radius or count) that enter the graph-based ego-centric representation is not specified, even though this choice is a free parameter that directly determines what counts as 'local' and therefore underpins the central premise.
  3. [Experiments] Experiments: no controlled ablations isolate the contribution of ELAI or FLA Loss while holding model capacity, training recipe, and feature extractors fixed; without them the attribution of SOTA gains to the local-interaction mechanisms remains unverified and the central claim is load-bearing on untested design choices.
minor comments (2)
  1. [Abstract] Abstract: 'colilision rate' is a typographical error and should read 'collision rate'.
  2. [Abstract] Abstract: the opening sentence equates motion prediction with planning; a brief clarification of how the two are distinguished in the proposed pipeline would improve precision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We provide point-by-point responses to the major comments and outline the changes we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Experimental Results] Abstract and Experimental Results: the reported gains (e.g., 41.9% average collision-rate reduction on Adv-nuScenes) are given without error bars, variance estimates, or statistical significance tests, so the reliability of the outperformance claim cannot be assessed from the presented data.

    Authors: We agree that error bars, variance estimates, and statistical significance tests would strengthen the assessment of our results. In the revised manuscript we will report standard deviations computed over multiple random seeds for the key metrics on Adv-nuScenes and include appropriate statistical tests comparing FocalAD against the strongest baselines. revision: yes

  2. Referee: [ELAI Module] ELAI description: the mechanism for selecting the local neighbors (radius or count) that enter the graph-based ego-centric representation is not specified, even though this choice is a free parameter that directly determines what counts as 'local' and therefore underpins the central premise.

    Authors: We acknowledge that the neighbor-selection procedure in ELAI should be stated more explicitly. In the revised version we will add a precise description of the selection rule (distance radius combined with a maximum agent count) together with the concrete hyper-parameter values used in all reported experiments and a brief discussion of their effect on the local-interaction premise. revision: yes

  3. Referee: [Experiments] Experiments: no controlled ablations isolate the contribution of ELAI or FLA Loss while holding model capacity, training recipe, and feature extractors fixed; without them the attribution of SOTA gains to the local-interaction mechanisms remains unverified and the central claim is load-bearing on untested design choices.

    Authors: We recognize the value of controlled ablations that isolate each proposed component. While the current experiments contain baseline comparisons and partial component studies, we will add new ablation tables in the revised manuscript that incrementally enable ELAI and FLA Loss on top of an otherwise identical backbone, keeping model capacity, training schedule, and feature extractors fixed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks

full rationale

The manuscript introduces FocalAD with ELAI graph module and FLA Loss weighting, then reports open-loop and closed-loop metrics on fixed public datasets (nuScenes, Adv-nuScenes, Bench2Drive) against independent prior baselines. No equations, loss terms, or predictions are shown to be algebraically identical to fitted inputs or to prior self-citations; the central performance numbers are direct experimental outputs, not quantities defined by construction inside the paper. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on one central domain assumption about the dominance of local interactions and on standard supervised learning machinery; no new physical entities or free parameters beyond typical loss weights are introduced.

free parameters (2)
  • neighbor selection radius or count
    The definition of which agents count as 'local' must be chosen and is not derived from first principles.
  • FLA Loss weighting coefficients
    The relative importance given to critical neighbors is a tunable hyper-parameter.
axioms (1)
  • domain assumption Planning decisions are primarily influenced by a small number of locally interacting agents
    Explicitly stated as the motivation for shifting from global to local motion features.

pith-pipeline@v0.9.0 · 5793 in / 1362 out tokens · 46732 ms · 2026-05-19T10:11:44.507443+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DriveFuture: Future-Aware Latent World Models for Autonomous Driving

    cs.CV 2026-05 unverdicted novelty 6.0

    DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    In: Proceedings of 13 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W.,et al.: Planning-oriented autonomous driving. In: Proceedings of 13 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17853–17862 (2023)

  2. [2]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang, C., Wang, X.: Vad: Vectorized scene representation for efficient autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8340–8350 (2023)

  3. [3]

    arXiv preprint arXiv:2405.19620 2405.19620

    Sun, W., Lin, X., Shi, Y., Zhang, C., Wu, H., Zheng, S.: SparseDrive: End- to-End Autonomous Driving via Sparse Scene Representation. arXiv preprint arXiv:2405.19620 2405.19620

  4. [4]

    Diffusiondrive: Trun- 9 cated diffusion model for end-to-end autonomous driving

    Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. arXiv preprint arXiv:2411.15139 (2024)

  5. [5]

    Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving.arXiv preprint arXiv:2503.05689, 2025

    Xing, Z., Zhang, X., Hu, Y., Jiang, B., He, T., Zhang, Q., Long, X., Yin, W.: Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. arXiv preprint arXiv:2503.05689 (2025)

  6. [6]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Chen, S., Jiang, B., Gao, H., Liao, B., Xu, Q., Zhang, Q., Huang, C., Liu, W., Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024)

  7. [7]

    arXiv preprint arXiv:2403.19098 (2024)

    Zhang, Y., Qian, D., Li, D., Pan, Y., Chen, Y., Liang, Z., Zhang, Z., Zhang, S., Li, H., Fu, M., et al.: Graphad: Interaction scene graph for end-to-end autonomous driving. arXiv preprint arXiv:2403.19098 (2024)

  8. [8]

    arXiv preprint arXiv:2503.14182 (2025)

    Zhang, B., Song, N., Jin, X., Zhang, L.: Bridging past and future: End-to- end autonomous driving with historical prediction and planning. arXiv preprint arXiv:2503.14182 (2025)

  9. [9]

    Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving.arXiv preprint arXiv:2503.03125, 2025

    Song, Z., Jia, C., Liu, L., Pan, H., Zhang, Y., Wang, J., Zhang, X., Xu, S., Yang, L., Luo, Y.: Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. arXiv preprint arXiv:2503.03125 (2025)

  10. [10]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

    Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  11. [11]

    IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2023)

    Chib, P.S., Singh, P.: Recent advancements in end-to-end autonomous driving using deep learning: A survey. IEEE Transactions on Intelligent Vehicles9(1), 103–118 (2023)

  12. [12]

    Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers

    Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., Yu, Q., Dai, J.: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotem- poral transformers.(2022). URL https://arxiv. org/abs/2203.17270 (2022) 14

  13. [13]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Wang, S., Liu, Y., Wang, T., Li, Y., Zhang, X.: Exploring object-centric tempo- ral modeling for efficient multi-view 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3621–3631 (2023)

  14. [14]

    arXiv preprint arXiv:2311.11722 (2023)

    Lin, X., Pei, Z., Lin, T., Huang, L., Su, Z.: Sparse4d v3: Advancing end-to-end 3d detection and tracking. arXiv preprint arXiv:2311.11722 (2023)

  15. [15]

    Advances in Neural Information Processing Systems35, 6531–6543 (2022)

    Shi, S., Jiang, L., Dai, D., Schiele, B.: Motion transformer with global inten- tion localization and local movement refinement. Advances in Neural Information Processing Systems35, 6531–6543 (2022)

  16. [16]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: Trackformer: Multi- object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)

  17. [17]

    In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

    Cheng, J., Chen, Y., Mei, X., Yang, B., Li, B., Liu, M.: Rethinking imitation- based planners for autonomous driving. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 14123–14130 (2024). IEEE

  18. [18]

    Baidu Apollo EM Motion Planner

    Fan, H., Zhu, F., Liu, C., Zhang, L., Zhuang, L., Li, D., Zhu, W., Hu, J., Li, H., Kong, Q.: Baidu apollo em motion planner. arXiv preprint arXiv:1807.08048 (2018)

  19. [19]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Jia, X., Wu, P., Chen, L., Xie, J., He, C., Yan, J., Li, H.: Think twice before driving: Towards scalable decoders for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21983–21994 (2023)

  20. [20]

    arXiv preprint arXiv:2308.01006 (2023)

    Ye, T., Jing, W., Hu, C., Huang, S., Gao, L., Li, F., Wang, J., Guo, K., Xiao, W., Mao, W., et al.: Fusionad: Multi-modality fusion for prediction and planning tasks of autonomous driving. arXiv preprint arXiv:2308.01006 (2023)

  21. [21]

    In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

    Fu, J., Shen, Y., Jian, Z., Chen, S., Xin, J., Zheng, N.: Interactionnet: Joint planning and prediction for autonomous driving with transformers. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9332–9339 (2023). IEEE

  22. [22]

    In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

    Chen, J., Xu, Z., Tomizuka, M.: End-to-end autonomous driving perception with sequential latent representation learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1999–2006 (2020). IEEE

  23. [23]

    IEEE Transactions on Intelligent Vehicles8(1), 673–683 (2022)

    Teng, S., Chen, L., Ai, Y., Zhou, Y., Xuanyuan, Z., Hu, X.: Hierarchi- cal interpretable imitation learning for end-to-end autonomous driving. IEEE Transactions on Intelligent Vehicles8(1), 673–683 (2022)

  24. [24]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Jia, X., Gao, Y., Chen, L., Yan, J., Liu, P.L., Li, H.: Driveadapter: Breaking the 15 coupling barrier of perception and planning in end-to-end autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7953–7963 (2023)

  25. [25]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Li, Z., Wang, W., Xie, E., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P., Lu, T.: Panoptic segformer: Delving deeper into panoptic segmentation with trans- formers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1280–1289 (2022)

  26. [26]

    In: European Conference on Computer Vision, pp

    Chen, Z., Ye, M., Xu, S., Cao, T., Chen, Q.: Ppad: Iterative interactions of predic- tion and planning for end-to-end autonomous driving. In: European Conference on Computer Vision, pp. 239–256 (2024). Springer

  27. [27]

    In: European Conference on Computer Vision, pp

    Zheng, W., Song, R., Guo, X., Zhang, C., Chen, L.: Genad: Generative end- to-end autonomous driving. In: European Conference on Computer Vision, pp. 87–104 (2024). Springer

  28. [28]

    arXiv preprint arXiv:2503.08162 (2025)

    Qian, K., Luo, Z., Jiang, S., Huang, Z., Miao, J., Ma, Z., Zhu, T., Li, J., He, Y., Fu, Z., et al.: Fasionad++: Integrating high-level instruction and information bottleneck in fat-slow fusion systems for enhanced safety in autonomous driving with adaptive feedback. arXiv preprint arXiv:2503.08162 (2025)

  29. [29]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krish- nan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

  30. [30]

    Bench2drive: Towards multi- ability benchmarking of closed-loop end-to-end autonomous driving.arXiv preprint arXiv:2406.03877, 2024

    Jia, X., Yang, Z., Li, Q., Zhang, Z., Yan, J.: Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving. arXiv preprint arXiv:2406.03877 (2024)

  31. [31]

    In: Conference on Robot Learning, pp

    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017). PMLR

  32. [32]

    arXiv preprint arXiv:2505.15880 (2025)

    Xu, Z., Li, B., Gao, H.-a., Gao, M., Chen, Y., Liu, M., Yan, C., Zhao, H., Feng, S., Zhao, H.: Challenger: Affordable adversarial driving video generation. arXiv preprint arXiv:2505.15880 (2025)

  33. [33]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  34. [34]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Gu, J., Hu, C., Zhang, T., Chen, X., Wang, Y., Wang, Y., Zhao, H.: Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5496–5506 (2023) 16

  35. [35]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Liang, M., Yang, B., Zeng, W., Chen, Y., Hu, R., Casas, S., Urtasun, R.: Pnpnet: End-to-end perception and prediction with tracking in the loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11553–11562 (2020)

  36. [36]

    In: European Conference on Computer Vision, pp

    Hu, S., Chen, L., Wu, P., Li, H., Yan, J., Tao, D.: St-p3: End-to-end vision- based autonomous driving via spatial-temporal feature learning. In: European Conference on Computer Vision, pp. 533–549 (2022). Springer

  37. [37]

    : Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking

    Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al. : Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking. Advances in Neural Information Processing Systems37, 28706–28719 (2024) 17