pith. machine review for the scientific record. sign in

arxiv: 2604.13853 · v1 · submitted 2026-04-15 · 💻 cs.RO

Recognition: unknown

Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners

Christoph Stiller, Jan-Hendrik Pauls, Lingguang Wang, Marlon Steiner, Nick Le Large, \"Omer \c{S}ahin Ta\c{s}, Willi Poh

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:26 UTC · model grok-4.3

classification 💻 cs.RO
keywords plannersmosaictrajectorylearnedrule-basedbenchmarkbestcls-r
0
0 comments X

The pith

Mosaic uses arbitration graphs to combine rule-based and learned motion planners for safer autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Mosaic as a framework that integrates rule-based planners, which are predictable but limited, with learned planners, which adapt well but can be opaque. It does this by using arbitration graphs that separate the creation of possible trajectories from their checking and selection. This setup allows the system to pick the best trajectory from either type while keeping every choice traceable. If correct, it means autonomous vehicles can achieve higher reliability in complex traffic without needing to retrain models or add data, and with fewer safety incidents.

Core claim

Mosaic decouples trajectory generation from verification and scoring through arbitration graphs, enabling the transparent combination of rule-based and learned planners. This yields superior closed-loop performance on benchmarks, with reduced at-fault collisions and better handling of interactive scenarios, all without additional training.

What carries the argument

Arbitration graphs, which structure decision-making by verifying and scoring trajectories from multiple planners at a higher level.

Load-bearing premise

The arbitration graphs and unified scoring reliably pick trajectories that combine strengths without creating new safety problems or reducing explainability.

What would settle it

Finding a driving scenario where the Mosaic planner causes a collision or unsafe behavior not seen in the individual rule-based or learned planners.

Figures

Figures reproduced from arXiv: 2604.13853 by Christoph Stiller, Jan-Hendrik Pauls, Lingguang Wang, Marlon Steiner, Nick Le Large, \"Omer \c{S}ahin Ta\c{s}, Willi Poh.

Figure 1
Figure 1. Figure 1: Mosaic combines complementary planners—e. g. rule-based and learned—within a structured and extensible framework. An arbitrator selects the best trajectory (green) from candidate proposals (blue) while a shared verification layer rejects unsafe plans (red), enabling safe and explainable motion planning. In this work, we propose a unified framework for motion planning based on arbitration graphs (AGs) [8], … view at source ↗
Figure 2
Figure 2. Figure 2: The proposed AG structure with 3 behavior components ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Behavior verification: One behavior component fails [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Behavior selection distribution of the cost arbitrator [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Safe and explainable motion planning remains a central challenge in autonomous driving. While rule-based planners offer predictable and explainable behavior, they often fail to grasp the complexity and uncertainty of real-world traffic. Conversely, learned planners exhibit strong adaptability but suffer from reduced transparency and occasional safety violations. We introduce Mosaic, an extensible framework for structured decision-making that integrates both paradigms through arbitration graphs. By decoupling trajectory verification and scoring from the generation of trajectories by individual planners, every decision becomes transparent and traceable. Trajectory verification at a higher level introduces redundancy between the planners, limiting emergency braking to the rare case where all planners fail to produce a valid trajectory. Through unified scoring and optimal trajectory selection, rule-based and learned planners with complementary strengths and weaknesses can be combined to yield the best of both worlds. In experimental evaluation on nuPlan, Mosaic achieves 95.48 CLS-NR and 93.98 CLS-R on the Val14 closed-loop benchmark, setting a new state of the art, while reducing at-fault collisions by 30% compared to either planner in isolation. On the interPlan benchmark, focused on highly interactive and difficult scenarios, Mosaic scores 54.30 CLS-R, outperforming its best constituent planner by 23.3% - all without retraining or requiring additional data. The code is available at github.com/KIT-MRT/mosaic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Mosaic, an extensible framework for autonomous driving motion planning that composes rule-based and learned planners via arbitration graphs. It decouples trajectory generation from higher-level verification and unified scoring to improve safety, explainability, and performance. The central empirical claims are state-of-the-art results on nuPlan: 95.48 CLS-NR and 93.98 CLS-R on Val14 (30% fewer at-fault collisions than either planner alone) and 54.30 CLS-R on interPlan (23.3% better than the best constituent planner), achieved without retraining or new data.

Significance. If the arbitration mechanism proves robust, the work offers a practical, modular path to combine the predictability of rule-based planners with the adaptability of learned ones, addressing a key tension in the field. The open-source release and use of public benchmarks (nuPlan Val14, interPlan) are strengths that support reproducibility and allow direct comparison.

major comments (2)
  1. [§3] §3 (Arbitration graphs and unified scoring): The 30% collision reduction and SOTA claims on Val14/interPlan rest on the assumption that higher-level verification and scoring will reliably detect interactions and uncertainties missed by individual planners. The manuscript provides no formal coverage argument, exhaustive edge-case analysis, or closed-loop failure-mode enumeration showing the combined trajectory pool is closed under the failure modes of either planner alone; this is load-bearing for the safety claims in interactive scenarios.
  2. [§5] §5 (Experimental evaluation): The reported metrics (e.g., 95.48 CLS-NR, 93.98 CLS-R, 54.30 CLS-R) and collision reductions lack accompanying ablation studies isolating the contribution of verification versus scoring, statistical significance tests, or error analysis across scenario types. Without these, it is difficult to attribute gains specifically to the framework rather than benchmark variance or planner selection.
minor comments (2)
  1. [Abstract / §3] The abstract states that 'every decision becomes transparent and traceable,' but the main text should include a concrete example (with a figure or table) tracing a single decision through the arbitration graph to illustrate this property.
  2. [§3] Notation for the unified scoring function and arbitration graph edges is introduced without a compact summary table; adding one would improve readability for readers comparing to prior hybrid planners.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review of our manuscript on Mosaic. We appreciate the emphasis on strengthening the safety claims and experimental analysis. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [§3] §3 (Arbitration graphs and unified scoring): The 30% collision reduction and SOTA claims on Val14/interPlan rest on the assumption that higher-level verification and scoring will reliably detect interactions and uncertainties missed by individual planners. The manuscript provides no formal coverage argument, exhaustive edge-case analysis, or closed-loop failure-mode enumeration showing the combined trajectory pool is closed under the failure modes of either planner alone; this is load-bearing for the safety claims in interactive scenarios.

    Authors: We agree that a formal coverage argument or exhaustive failure-mode enumeration would strengthen the safety claims. The current work presents Mosaic as an extensible empirical framework for composing planners rather than a formally verified system; the arbitration graph provides practical redundancy by requiring higher-level verification before selection, which empirically yields the reported collision reductions. We cannot supply a formal proof of closure under all failure modes without shifting the paper's scope to theoretical analysis. In revision we will add a dedicated limitations subsection to §3 explicitly discussing these assumptions and the reliance on empirical validation, plus expanded appendix material with additional closed-loop failure analysis drawn from the nuPlan Val14 and interPlan scenarios. revision: partial

  2. Referee: [§5] §5 (Experimental evaluation): The reported metrics (e.g., 95.48 CLS-NR, 93.98 CLS-R, 54.30 CLS-R) and collision reductions lack accompanying ablation studies isolating the contribution of verification versus scoring, statistical significance tests, or error analysis across scenario types. Without these, it is difficult to attribute gains specifically to the framework rather than benchmark variance or planner selection.

    Authors: We thank the referee for this observation. The manuscript already compares the full Mosaic system against each constituent planner, but we accept that explicit ablations and statistical analysis would improve attribution. We will revise §5 to add: (i) targeted ablations separating the verification module from the unified scoring function, (ii) statistical significance testing via bootstrap confidence intervals on the key CLS scores and at-fault collision rates, and (iii) a breakdown of results by scenario category (interactive, non-interactive, and edge cases) from both benchmarks. These changes will clarify that the 30% collision reduction and SOTA scores arise from the composition framework rather than variance or planner choice alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluation on public benchmarks

full rationale

The paper describes an extensible arbitration framework that combines existing rule-based and learned planners through verification, scoring, and selection steps. All performance claims (SOTA scores on Val14 and interPlan, collision reductions) are presented as direct outcomes of closed-loop simulation on public nuPlan-derived benchmarks using unmodified constituent planners, with no equations, fitted parameters, or self-citations invoked as load-bearing derivations. The central results are therefore falsifiable by re-running the same public benchmarks and do not reduce to any input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The framework introduces arbitration graphs as the core integration mechanism but does not rely on new mathematical axioms or fitted parameters beyond standard benchmark protocols.

invented entities (1)
  • Arbitration graphs no independent evidence
    purpose: To structure decision-making by decoupling trajectory verification and scoring from individual planner generation
    Presented as the extensible core of Mosaic for combining rule-based and learned planners.

pith-pipeline@v0.9.0 · 5572 in / 1224 out tokens · 32277 ms · 2026-05-10T13:26:47.095434+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Congested Traffic States in Empirical Observations and Microscopic Simulations,

    M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,”Physical Review E, 2000

  2. [2]

    Parting with Misconceptions about Learning-based Vehicle Motion Planning,

    D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with Misconceptions about Learning-based Vehicle Motion Planning,” in CoRL, 2023

  3. [3]

    Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,

    J. Cheng, Y . Chen, and Q. Chen, “PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving,” arXiv:2404.14327, 2024

  4. [4]

    Dif- fusiondrive: Truncated diffusion model for end-to-end autonomous driving,

    B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, et al., “Dif- fusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inCVPR, 2025

  5. [5]

    Flowdrive: moderated flow matching with data balancing for trajectory planning,

    L. Wang, Ö. ¸ S. Ta¸ s, M. Steiner, and C. Stiller, “Flowdrive: Moderated flow matching with data balancing for trajectory planning,”arXiv preprint arXiv:2509.21961, 2025

  6. [6]

    Robust autonomy emerges from self- play,

    M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Pe- trenko, E. Vinitsky, et al., “Robust autonomy emerges from self- play,” inICML, 2025

  7. [7]

    GameFormer: Game-theoretic Mod- eling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving,

    Z. Huang, H. Liu, and C. Lv, “GameFormer: Game-theoretic Mod- eling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving,” inICCV, 2023

  8. [8]

    Cognitive concepts in autonomous soccer playing robots,

    M. Lauer, R. Hafner, S. Lange, and M. Riedmiller, “Cognitive concepts in autonomous soccer playing robots,”Cognitive Systems Research, 2010

  9. [9]

    Better safe than sorry: En- hancing arbitration graphs for safe and robust autonomous decision- making,

    P. Spieker, N. Le Large, and M. Lauer, “Better safe than sorry: En- hancing arbitration graphs for safe and robust autonomous decision- making,” inIEEE International Conference on Systems, Man, and Cybernetics (SMC), 2025

  10. [10]

    Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Un- certainty,

    Ö. ¸ S. Ta¸ s, P. H. Brusius, and C. Stiller, “Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Un- certainty,”arXiv:2310.17963, 2023

  11. [11]

    Rethinking imitation-based planners for autonomous driving,

    J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethinking imitation-based planners for autonomous driving,” inICRA, 2024

  12. [12]

    Behaviorgpt: Smart agent simulation for autonomous driving with next-patch prediction,

    Z. Zhou, H. Haibo, X. Chen, J. Wang, N. Guan, K. Wu, et al., “Behaviorgpt: Smart agent simulation for autonomous driving with next-patch prediction,”NeurIPS, 2024

  13. [13]

    Smart: Scalable multi-agent real-time motion generation via next-token prediction,

    W. Wu, X. Feng, Z. Gao, and Y . Kan, “Smart: Scalable multi-agent real-time motion generation via next-token prediction,”NeurIPS, 2024

  14. [14]

    Graph- Based Adversarial Imitation Learning for Predicting Human Driving Behavior,

    F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Graph- Based Adversarial Imitation Learning for Predicting Human Driving Behavior,” inIEEE Intelligent Vehicles Symposium (IV), 2024

  15. [15]

    Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving,

    P. Li and D. Cui, “Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving,”arXiv:2409.18341, 2025

  16. [16]

    Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024

    C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, et al., “DRAMA: An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba,”arXiv:2408.03601, 2024

  17. [17]

    Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive trans- formers,

    Y . Chen, Y . Wang, and Z. Zhang, “Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive trans- formers,” inCVPR, 2025

  18. [18]

    RAP: Risk-Aware Prediction for Robust Planning,

    H. Nishimura, J. Mercat, B. Wulfe, R. T. McAllister, and A. Gaidon, “RAP: Risk-Aware Prediction for Robust Planning,” inCoRL, 2023

  19. [19]

    Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning,

    Z. Huang, X. Weng, M. Igl, Y . Chen, Y . Cao, B. Ivanovic, et al., “Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning,” inICRA, 2025

  20. [20]

    Solving motion planning tasks with a scalable generative model,

    Y . Hu, S. Chai, Z. Yang, J. Qian, K. Li, W. Shao, et al., “Solving motion planning tasks with a scalable generative model,” inECCV, 2024

  21. [21]

    Diffusion-based planning for autonomous driving with flexible guidance,

    Y . Zheng, R. Liang, K. ZHENG, J. Zheng, L. Mao, J. Li, et al., “Diffusion-based planning for autonomous driving with flexible guidance,” inICLR, 2025

  22. [22]

    Asyn- chronous large language model enhanced planner for autonomous driving,

    Y . Chen, Z.-h. Ding, Z. Wang, Y . Wang, L. Zhang, and S. Liu, “Asyn- chronous large language model enhanced planner for autonomous driving,” inECCV, 2024

  23. [23]

    Driving everywhere with large language model policy adaptation,

    B. Li, Y . Wang, J. Mao, B. Ivanovic, S. Veer, K. Leung, et al., “Driving everywhere with large language model policy adaptation,” inCVPR, 2024

  24. [24]

    Drivelm: Driving with graph visual question answering,

    C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, et al., “Drivelm: Driving with graph visual question answering,” inECCV, 2024

  25. [25]

    Vlp: Vision language planning for autonomous driving,

    C. Pan, B. Yaman, T. Nesti, A. Mallik, A. G. Allievi, S. Velipasalar, et al., “Vlp: Vision language planning for autonomous driving,” in CVPR, 2024

  26. [26]

    Safety reinforced model predictive control (srmpc): Improving mpc with reinforcement learning for motion planning in autonomous driving,

    J. Fischer, M. Steiner, Ö. ¸ S. Ta¸ s, and C. Stiller, “Safety reinforced model predictive control (srmpc): Improving mpc with reinforcement learning for motion planning in autonomous driving,” inITSC, 2023

  27. [27]

    Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme,

    P. F. Orzechowski, C. Burger, and M. Lauer, “Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme,” inIV, 2020

  28. [28]

    arXiv preprint arXiv:2106.11810 (2021) 3, 7

    H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, et al., “Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

  29. [29]

    Flow Matching for Generative Modeling

    Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

  30. [30]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,”arXiv preprint arXiv:2209.03003, 2022

  31. [31]

    Scalable diffusion models with transform- ers,

    W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inCVPR, 2023

  32. [32]

    Can vehicle motion planning generalize to realistic long-tail scenarios?

    M. Hallgarten, J. Zapata, M. Stoll, K. Renz, and A. Zell, “Can vehicle motion planning generalize to realistic long-tail scenarios?” InIROS, 2024