Recognition: unknown
Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners
Pith reviewed 2026-05-10 13:26 UTC · model grok-4.3
The pith
Mosaic uses arbitration graphs to combine rule-based and learned motion planners for safer autonomous driving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mosaic decouples trajectory generation from verification and scoring through arbitration graphs, enabling the transparent combination of rule-based and learned planners. This yields superior closed-loop performance on benchmarks, with reduced at-fault collisions and better handling of interactive scenarios, all without additional training.
What carries the argument
Arbitration graphs, which structure decision-making by verifying and scoring trajectories from multiple planners at a higher level.
Load-bearing premise
The arbitration graphs and unified scoring reliably pick trajectories that combine strengths without creating new safety problems or reducing explainability.
What would settle it
Finding a driving scenario where the Mosaic planner causes a collision or unsafe behavior not seen in the individual rule-based or learned planners.
Figures
read the original abstract
Safe and explainable motion planning remains a central challenge in autonomous driving. While rule-based planners offer predictable and explainable behavior, they often fail to grasp the complexity and uncertainty of real-world traffic. Conversely, learned planners exhibit strong adaptability but suffer from reduced transparency and occasional safety violations. We introduce Mosaic, an extensible framework for structured decision-making that integrates both paradigms through arbitration graphs. By decoupling trajectory verification and scoring from the generation of trajectories by individual planners, every decision becomes transparent and traceable. Trajectory verification at a higher level introduces redundancy between the planners, limiting emergency braking to the rare case where all planners fail to produce a valid trajectory. Through unified scoring and optimal trajectory selection, rule-based and learned planners with complementary strengths and weaknesses can be combined to yield the best of both worlds. In experimental evaluation on nuPlan, Mosaic achieves 95.48 CLS-NR and 93.98 CLS-R on the Val14 closed-loop benchmark, setting a new state of the art, while reducing at-fault collisions by 30% compared to either planner in isolation. On the interPlan benchmark, focused on highly interactive and difficult scenarios, Mosaic scores 54.30 CLS-R, outperforming its best constituent planner by 23.3% - all without retraining or requiring additional data. The code is available at github.com/KIT-MRT/mosaic.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Mosaic, an extensible framework for autonomous driving motion planning that composes rule-based and learned planners via arbitration graphs. It decouples trajectory generation from higher-level verification and unified scoring to improve safety, explainability, and performance. The central empirical claims are state-of-the-art results on nuPlan: 95.48 CLS-NR and 93.98 CLS-R on Val14 (30% fewer at-fault collisions than either planner alone) and 54.30 CLS-R on interPlan (23.3% better than the best constituent planner), achieved without retraining or new data.
Significance. If the arbitration mechanism proves robust, the work offers a practical, modular path to combine the predictability of rule-based planners with the adaptability of learned ones, addressing a key tension in the field. The open-source release and use of public benchmarks (nuPlan Val14, interPlan) are strengths that support reproducibility and allow direct comparison.
major comments (2)
- [§3] §3 (Arbitration graphs and unified scoring): The 30% collision reduction and SOTA claims on Val14/interPlan rest on the assumption that higher-level verification and scoring will reliably detect interactions and uncertainties missed by individual planners. The manuscript provides no formal coverage argument, exhaustive edge-case analysis, or closed-loop failure-mode enumeration showing the combined trajectory pool is closed under the failure modes of either planner alone; this is load-bearing for the safety claims in interactive scenarios.
- [§5] §5 (Experimental evaluation): The reported metrics (e.g., 95.48 CLS-NR, 93.98 CLS-R, 54.30 CLS-R) and collision reductions lack accompanying ablation studies isolating the contribution of verification versus scoring, statistical significance tests, or error analysis across scenario types. Without these, it is difficult to attribute gains specifically to the framework rather than benchmark variance or planner selection.
minor comments (2)
- [Abstract / §3] The abstract states that 'every decision becomes transparent and traceable,' but the main text should include a concrete example (with a figure or table) tracing a single decision through the arbitration graph to illustrate this property.
- [§3] Notation for the unified scoring function and arbitration graph edges is introduced without a compact summary table; adding one would improve readability for readers comparing to prior hybrid planners.
Simulated Author's Rebuttal
Thank you for the constructive review of our manuscript on Mosaic. We appreciate the emphasis on strengthening the safety claims and experimental analysis. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§3] §3 (Arbitration graphs and unified scoring): The 30% collision reduction and SOTA claims on Val14/interPlan rest on the assumption that higher-level verification and scoring will reliably detect interactions and uncertainties missed by individual planners. The manuscript provides no formal coverage argument, exhaustive edge-case analysis, or closed-loop failure-mode enumeration showing the combined trajectory pool is closed under the failure modes of either planner alone; this is load-bearing for the safety claims in interactive scenarios.
Authors: We agree that a formal coverage argument or exhaustive failure-mode enumeration would strengthen the safety claims. The current work presents Mosaic as an extensible empirical framework for composing planners rather than a formally verified system; the arbitration graph provides practical redundancy by requiring higher-level verification before selection, which empirically yields the reported collision reductions. We cannot supply a formal proof of closure under all failure modes without shifting the paper's scope to theoretical analysis. In revision we will add a dedicated limitations subsection to §3 explicitly discussing these assumptions and the reliance on empirical validation, plus expanded appendix material with additional closed-loop failure analysis drawn from the nuPlan Val14 and interPlan scenarios. revision: partial
-
Referee: [§5] §5 (Experimental evaluation): The reported metrics (e.g., 95.48 CLS-NR, 93.98 CLS-R, 54.30 CLS-R) and collision reductions lack accompanying ablation studies isolating the contribution of verification versus scoring, statistical significance tests, or error analysis across scenario types. Without these, it is difficult to attribute gains specifically to the framework rather than benchmark variance or planner selection.
Authors: We thank the referee for this observation. The manuscript already compares the full Mosaic system against each constituent planner, but we accept that explicit ablations and statistical analysis would improve attribution. We will revise §5 to add: (i) targeted ablations separating the verification module from the unified scoring function, (ii) statistical significance testing via bootstrap confidence intervals on the key CLS scores and at-fault collision rates, and (iii) a breakdown of results by scenario category (interactive, non-interactive, and edge cases) from both benchmarks. These changes will clarify that the 30% collision reduction and SOTA scores arise from the composition framework rather than variance or planner choice alone. revision: yes
Circularity Check
No circularity: empirical framework evaluation on public benchmarks
full rationale
The paper describes an extensible arbitration framework that combines existing rule-based and learned planners through verification, scoring, and selection steps. All performance claims (SOTA scores on Val14 and interPlan, collision reductions) are presented as direct outcomes of closed-loop simulation on public nuPlan-derived benchmarks using unmodified constituent planners, with no equations, fitted parameters, or self-citations invoked as load-bearing derivations. The central results are therefore falsifiable by re-running the same public benchmarks and do not reduce to any input by construction.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Arbitration graphs
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Congested Traffic States in Empirical Observations and Microscopic Simulations,
M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,”Physical Review E, 2000
2000
-
[2]
Parting with Misconceptions about Learning-based Vehicle Motion Planning,
D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with Misconceptions about Learning-based Vehicle Motion Planning,” in CoRL, 2023
2023
-
[3]
Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,
J. Cheng, Y . Chen, and Q. Chen, “PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving,” arXiv:2404.14327, 2024
-
[4]
Dif- fusiondrive: Truncated diffusion model for end-to-end autonomous driving,
B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, et al., “Dif- fusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inCVPR, 2025
2025
-
[5]
Flowdrive: moderated flow matching with data balancing for trajectory planning,
L. Wang, Ö. ¸ S. Ta¸ s, M. Steiner, and C. Stiller, “Flowdrive: Moderated flow matching with data balancing for trajectory planning,”arXiv preprint arXiv:2509.21961, 2025
-
[6]
Robust autonomy emerges from self- play,
M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Pe- trenko, E. Vinitsky, et al., “Robust autonomy emerges from self- play,” inICML, 2025
2025
-
[7]
GameFormer: Game-theoretic Mod- eling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving,
Z. Huang, H. Liu, and C. Lv, “GameFormer: Game-theoretic Mod- eling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving,” inICCV, 2023
2023
-
[8]
Cognitive concepts in autonomous soccer playing robots,
M. Lauer, R. Hafner, S. Lange, and M. Riedmiller, “Cognitive concepts in autonomous soccer playing robots,”Cognitive Systems Research, 2010
2010
-
[9]
Better safe than sorry: En- hancing arbitration graphs for safe and robust autonomous decision- making,
P. Spieker, N. Le Large, and M. Lauer, “Better safe than sorry: En- hancing arbitration graphs for safe and robust autonomous decision- making,” inIEEE International Conference on Systems, Man, and Cybernetics (SMC), 2025
2025
-
[10]
Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Un- certainty,
Ö. ¸ S. Ta¸ s, P. H. Brusius, and C. Stiller, “Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Un- certainty,”arXiv:2310.17963, 2023
-
[11]
Rethinking imitation-based planners for autonomous driving,
J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethinking imitation-based planners for autonomous driving,” inICRA, 2024
2024
-
[12]
Behaviorgpt: Smart agent simulation for autonomous driving with next-patch prediction,
Z. Zhou, H. Haibo, X. Chen, J. Wang, N. Guan, K. Wu, et al., “Behaviorgpt: Smart agent simulation for autonomous driving with next-patch prediction,”NeurIPS, 2024
2024
-
[13]
Smart: Scalable multi-agent real-time motion generation via next-token prediction,
W. Wu, X. Feng, Z. Gao, and Y . Kan, “Smart: Scalable multi-agent real-time motion generation via next-token prediction,”NeurIPS, 2024
2024
-
[14]
Graph- Based Adversarial Imitation Learning for Predicting Human Driving Behavior,
F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Graph- Based Adversarial Imitation Learning for Predicting Human Driving Behavior,” inIEEE Intelligent Vehicles Symposium (IV), 2024
2024
-
[15]
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving,
P. Li and D. Cui, “Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving,”arXiv:2409.18341, 2025
-
[16]
C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, et al., “DRAMA: An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba,”arXiv:2408.03601, 2024
-
[17]
Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive trans- formers,
Y . Chen, Y . Wang, and Z. Zhang, “Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive trans- formers,” inCVPR, 2025
2025
-
[18]
RAP: Risk-Aware Prediction for Robust Planning,
H. Nishimura, J. Mercat, B. Wulfe, R. T. McAllister, and A. Gaidon, “RAP: Risk-Aware Prediction for Robust Planning,” inCoRL, 2023
2023
-
[19]
Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning,
Z. Huang, X. Weng, M. Igl, Y . Chen, Y . Cao, B. Ivanovic, et al., “Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning,” inICRA, 2025
2025
-
[20]
Solving motion planning tasks with a scalable generative model,
Y . Hu, S. Chai, Z. Yang, J. Qian, K. Li, W. Shao, et al., “Solving motion planning tasks with a scalable generative model,” inECCV, 2024
2024
-
[21]
Diffusion-based planning for autonomous driving with flexible guidance,
Y . Zheng, R. Liang, K. ZHENG, J. Zheng, L. Mao, J. Li, et al., “Diffusion-based planning for autonomous driving with flexible guidance,” inICLR, 2025
2025
-
[22]
Asyn- chronous large language model enhanced planner for autonomous driving,
Y . Chen, Z.-h. Ding, Z. Wang, Y . Wang, L. Zhang, and S. Liu, “Asyn- chronous large language model enhanced planner for autonomous driving,” inECCV, 2024
2024
-
[23]
Driving everywhere with large language model policy adaptation,
B. Li, Y . Wang, J. Mao, B. Ivanovic, S. Veer, K. Leung, et al., “Driving everywhere with large language model policy adaptation,” inCVPR, 2024
2024
-
[24]
Drivelm: Driving with graph visual question answering,
C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, et al., “Drivelm: Driving with graph visual question answering,” inECCV, 2024
2024
-
[25]
Vlp: Vision language planning for autonomous driving,
C. Pan, B. Yaman, T. Nesti, A. Mallik, A. G. Allievi, S. Velipasalar, et al., “Vlp: Vision language planning for autonomous driving,” in CVPR, 2024
2024
-
[26]
Safety reinforced model predictive control (srmpc): Improving mpc with reinforcement learning for motion planning in autonomous driving,
J. Fischer, M. Steiner, Ö. ¸ S. Ta¸ s, and C. Stiller, “Safety reinforced model predictive control (srmpc): Improving mpc with reinforcement learning for motion planning in autonomous driving,” inITSC, 2023
2023
-
[27]
Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme,
P. F. Orzechowski, C. Burger, and M. Lauer, “Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme,” inIV, 2020
2020
-
[28]
arXiv preprint arXiv:2106.11810 (2021) 3, 7
H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, et al., “Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021
-
[29]
Flow Matching for Generative Modeling
Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[30]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,”arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[31]
Scalable diffusion models with transform- ers,
W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inCVPR, 2023
2023
-
[32]
Can vehicle motion planning generalize to realistic long-tail scenarios?
M. Hallgarten, J. Zapata, M. Stoll, K. Renz, and A. Zell, “Can vehicle motion planning generalize to realistic long-tail scenarios?” InIROS, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.