pith. machine review for the scientific record. sign in

arxiv: 2604.27994 · v1 · submitted 2026-04-30 · 💻 cs.RO

Recognition: unknown

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:06 UTC · model grok-4.3

classification 💻 cs.RO
keywords zero-shot transferCARLA simulatorsemantic rollouttown-adversarial regularizationDreamer world modelfixed-route drivinglatent state invarianceautonomous driving generalization
0
0 comments X

The pith

Semantic rollout supervision and town-adversarial regularization improve mean held-out-town route completion in CARLA

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies zero-shot transfer of a closed-loop fixed-route driving agent in the CARLA simulator from training towns to two unseen towns while keeping weather fixed and removing traffic and pedestrians. It starts from a Dreamer-style latent world model and adds two training-only losses: multi-horizon prediction of future visual-semantic embeddings during imagined rollouts and town-adversarial supervision on a semantic projection of the recurrent latent state. A causal context feature conditions the rollout predictor, while the actor and critic use the standard control feature. The policy receives only camera images and no map, route, or goal information. The resulting agent records the highest mean success rate among the tested Dreamer-family methods on the held-out towns.

Core claim

In this controlled fixed-weather CARLA setting with no traffic or pedestrians, training a Dreamer-style agent with multi-horizon prediction of future visual-semantic embeddings along imagined rollouts plus town-adversarial supervision on a semantic projection of the recurrent latent state yields the highest mean success rate on fixed routes in held-out towns among the compared Dreamer-family methods.

What carries the argument

Multi-horizon semantic embedding prediction along imagined rollouts conditioned on causal context features, combined with town-adversarial regularization applied to a semantic projection of the recurrent latent state.

If this is right

  • The combined auxiliary losses produce higher mean route completion on fixed routes in unseen towns.
  • Safety and lane-keeping metrics show mixed results across the evaluated held-out towns.
  • The policy operates without navigation commands, route polylines, goal poses, or map input at inference time.
  • The reference route is used only by the environment to compute reward, progress, success, and termination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the auxiliary losses are the source of the gains, the same regularizers could be tested for robustness under weather changes or added traffic.
  • The separation between causal context features and control features may allow the world model to learn town-invariant semantics while the policy focuses on driving actions.
  • Reproducing the exact baseline implementations with only the added losses removed would confirm whether the reported improvements are directly attributable to the proposed objectives.

Load-bearing premise

The observed gains in held-out-town success rates are caused by the semantic rollout supervision and town-adversarial regularization rather than by other unstated differences in implementation, hyperparameters, or training schedule between the proposed model and the Dreamer baselines.

What would settle it

Re-implement all Dreamer baselines inside the exact same codebase and training schedule as the proposed model, differing only by the absence of the two auxiliary losses, then check whether the success-rate advantage on held-out towns disappears.

Figures

Figures reproduced from arXiv: 2604.27994 by Feeza Khan Khanzada, Jaerock Kwon.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework. The model is trained on source-domain rollouts using a world model, a semantic auxiliary view at source ↗
Figure 2
Figure 2. Figure 2: Route maps for the towns used in our experiments. Town06 view at source ↗
Figure 3
Figure 3. Figure 3: Failure-mode composition on held-out Town03 and Town04, pooled across the already evaluated episodes from the three training seeds view at source ↗
Figure 4
Figure 4. Figure 4: Held-out-town success rates on Town03 and Town04, regrouped by whether the actor and critic consume the standard Dreamer feature view at source ↗
read the original abstract

Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the CARLA simulator: zero-shot transfer of a closed-loop fixed-route driving agent from Town05 and Town06 to unseen Town03 and Town04. The study isolates structural town shift by keeping weather fixed to ClearNoon and removing traffic and pedestrians. We build on a Dreamer-style latent world-model agent and add two training-only auxiliary losses: multi-horizon prediction of future visual-semantic embeddings along imagined rollouts and town-adversarial supervision on a semantic projection of the recurrent latent state. A causal context feature conditions the semantic rollout predictor, while the actor and critic retain the standard control feature. The policy receives no navigation command, route polyline, goal pose, or map input; the reference route is used only by the environment for reward, progress, success, and termination. Across the evaluated held-out towns, the proposed model achieves the highest mean success rate among the included Dreamer-family methods. Secondary safety and lane-keeping metrics are mixed across towns. These results support a bounded conclusion: in this controlled fixed-weather CARLA setting, semantic rollout supervision combined with town-adversarial regularization improves mean held-out-town route completion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript presents a Dreamer-style latent world-model agent for closed-loop fixed-route driving in CARLA. It augments training with two auxiliary losses: multi-horizon prediction of future visual-semantic embeddings along imagined rollouts and town-adversarial supervision on a semantic projection of the recurrent latent state. A causal context feature conditions the rollout predictor while the actor and critic use the standard control feature. The policy receives no navigation commands, route polylines, or maps; the reference route is used only for environment reward, progress, success, and termination. Training occurs on Town05 and Town06; zero-shot evaluation is performed on held-out Town03 and Town04 under fixed ClearNoon weather with no traffic or pedestrians. The central claim is that the proposed model achieves the highest mean success rate among included Dreamer-family methods, supporting the utility of the two auxiliary losses for this bounded town-shift generalization task.

Significance. If the empirical gains are shown to arise specifically from the auxiliary losses under identical base implementations, the work would demonstrate that semantic rollout supervision and town-adversarial regularization can improve zero-shot route completion for latent world-model agents in a controlled CARLA setting. The deliberate isolation of structural town shift (fixed weather, no dynamic agents) is a strength that allows clearer attribution than typical multi-factor transfer studies. However, the highly bounded scenario limits immediate broader significance to more realistic conditions involving weather variation or traffic. The design choice of keeping policy inputs unchanged while adding training-only losses is a positive aspect for fair evaluation of generalization.

major comments (1)
  1. The central claim attributes higher mean held-out-town success rates to the addition of semantic rollout supervision and town-adversarial regularization. This attribution is load-bearing only if the base Dreamer agent is identical across comparisons. The manuscript must explicitly confirm (via statement or hyperparameter table in the Experimental Setup section) that recurrent state size, world-model architecture, actor-critic, optimizer, learning-rate schedule, imagination horizon, and total training steps are held constant; any unstated deviation would confound the success-rate delta and render it uninterpretable as evidence for the auxiliary losses.
minor comments (3)
  1. The abstract states that auxiliary losses improve route completion and that secondary safety/lane-keeping metrics are mixed, but does not report numerical values, standard deviations, or effect sizes. Adding a concise summary of key metrics (e.g., mean success rates per town) would improve readability.
  2. The distinction between the causal context feature supplied to the semantic rollout predictor and the standard control feature used by the actor/critic should be illustrated with a diagram or pseudocode in the Methods section to clarify information flow.
  3. Figure captions for any success-rate or metric plots should explicitly list the compared methods and the exact evaluation towns to avoid ambiguity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the importance of explicit confirmation that base implementations are identical. We agree that this is necessary to support attribution of performance gains to the auxiliary losses. We address the major comment below and will incorporate the requested clarification in the revised manuscript.

read point-by-point responses
  1. Referee: The central claim attributes higher mean held-out-town success rates to the addition of semantic rollout supervision and town-adversarial regularization. This attribution is load-bearing only if the base Dreamer agent is identical across comparisons. The manuscript must explicitly confirm (via statement or hyperparameter table in the Experimental Setup section) that recurrent state size, world-model architecture, actor-critic, optimizer, learning-rate schedule, imagination horizon, and total training steps are held constant; any unstated deviation would confound the success-rate delta and render it uninterpretable as evidence for the auxiliary losses.

    Authors: We agree that the attribution requires explicit confirmation of identical base implementations. In the experiments, the underlying Dreamer agent (including recurrent state size, world-model architecture, actor-critic networks, optimizer, learning-rate schedule, imagination horizon, and total training steps) is held constant across all compared methods; the only differences are the addition of the semantic rollout prediction loss and town-adversarial regularization for the proposed model. All methods are implemented within the same codebase and training pipeline. To make this fully transparent, we will add a hyperparameter table to the Experimental Setup section of the revised manuscript that enumerates these shared values and includes an explicit statement confirming that no other deviations exist between the base agent and the augmented variants. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of auxiliary losses in RL agent

full rationale

The paper is an empirical RL study that reports higher mean held-out-town route completion when semantic-rollout supervision and town-adversarial regularization are added to a Dreamer-style base agent. No mathematical derivation chain exists; the central claim rests on experimental outcomes rather than any equation or prediction that reduces by construction to a fitted parameter or self-citation. The base architecture is referenced to prior Dreamer work by unrelated authors, and all reported metrics are externally measured success rates in the CARLA simulator. No self-definitional, fitted-input-as-prediction, or load-bearing self-citation patterns are present. The result is therefore self-contained as a controlled experimental comparison.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The work rests on standard reinforcement-learning assumptions and simulator properties; no new physical entities are postulated.

free parameters (1)
  • auxiliary loss coefficients
    Weights balancing the semantic rollout and adversarial terms against the standard Dreamer objective are chosen during training.
axioms (2)
  • domain assumption CARLA towns differ only in static layout while sharing identical rendering and physics under ClearNoon
    Used to isolate structural town shift as the sole domain gap.
  • standard math The latent state learned by Dreamer is Markovian given the chosen observation and action history
    Required for the imagined rollouts and recurrent state to be sufficient statistics.

pith-pipeline@v0.9.0 · 5538 in / 1676 out tokens · 131915 ms · 2026-05-07T05:06:21.515433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    CARLA: An Open Urban Driving Simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. López, and V. Koltun, “CARLA: An Open Urban Driving Simulator,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 1–16

  2. [2]

    Domain- Adversarial Training of Neural Networks,

    Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain- Adversarial Training of Neural Networks,” inDomain Adaptation in Computer Vision Applications, G. Csurka, Ed. Cham: Springer International Publishing, 2017, pp. 189–209, series Title: Advances in Computer Vision and Pattern Recognition. [Onl...

  3. [3]

    Mastering Diverse Domains through World Models

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models,” Apr. 2024, arXiv:2301.04104 [cs]. [Online]. Available: http://arxiv.org/abs/2301.04104

  4. [4]

    Reproducible Scaling Laws for Contrastive Language-Image Learning,

    M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Il- harco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible Scaling Laws for Contrastive Language-Image Learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 2818–2829

  5. [5]

    End-to-End Driving via Conditional Imitation Learning,

    F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-End Driving via Conditional Imitation Learning,” in 2018 IEEE International Conference on Robotics and Automa- tion (ICRA). IEEE, 2018, pp. 4693–4700

  6. [6]

    Learning by Cheating,

    D. Chen, B. Zhou, V. Koltun, and P. Krähenbühl, “Learning by Cheating,” Dec. 2019, arXiv:1912.12294 [cs]. [Online]. Available: http://arxiv.org/abs/1912.12294

  7. [7]

    Label Efficient Visual Abstractions for Autonomous Driving,

    A. Behl, K. Chitta, A. Prakash, E. Ohn-Bar, and A. Geiger, “Label Efficient Visual Abstractions for Autonomous Driving,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 2338–2345

  8. [8]

    RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban En- vironments,

    D. Coelho, M. Oliveira, and V. Santos, “RLAD: Reinforcement Learning from Pixels for Autonomous Driving in Urban En- vironments,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 4, pp. 7427–7435, 2024

  9. [9]

    GRI:GeneralReinforcedImitationanditsApplicationtoVision- Based Autonomous Driving,

    R. Chekroun, M. Toromanoff, S. Hornauer, and F. Moutarde, “GRI:GeneralReinforcedImitationanditsApplicationtoVision- Based Autonomous Driving,” May 2022, arXiv:2111.08575 [cs]. [Online]. Available: http://arxiv.org/abs/2111.08575

  10. [10]

    Learning Latent Dynamics for Planning from Pixels,

    D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning Latent Dynamics for Planning from Pixels,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 97. PMLR, 2019, pp. 2555–2565

  11. [11]

    Dream to Con- trol:LearningBehaviorsbyLatentImagination,

    D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to Con- trol:LearningBehaviorsbyLatentImagination,”inInternational Conference on Learning Representations, 2020

  12. [12]

    Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2),

    Q. Li, X. Jia, S. Wang, and J. Yan, “Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2),” inComputer Vision – ECCV 2024, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds. Cham: Springer Nature Switzerland, 2025, vol. 15103, pp. 142–158, series Title: Lecture ...

  13. [13]

    Learning to drive from a world on rails,

    D. Chen, V. Koltun, and P. Krähenbühl, “Learning to drive from a world on rails,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada: IEEE, 2021, pp. 15570–15579

  14. [14]

    Model-Based Imitation Learning for Urban Driving,

    A. Hu, G. Corrado, N. Griffiths, Z. Murez, C. Gurau, H. Yeo, A. Kendall, R. Cipolla, and J. Shotton, “Model-Based Imitation Learning for Urban Driving,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 20703–20716

  15. [15]

    EnhanceSampleEfficiencyandRobustnessofEnd-to-EndUrban Autonomous Driving via Semantic Masked World Model,

    Z. Gao, Y. Mu, C. Chen, J. Duan, S. E. Li, P. Luo, and Y. Lu, “EnhanceSampleEfficiencyandRobustnessofEnd-to-EndUrban Autonomous Driving via Semantic Masked World Model,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 10, pp. 13067–13079, 2024

  16. [16]

    Raw2Drive: Reinforcement Learning with Aligned World Models for End- to-End Autonomous Driving (in CARLA v2),

    Z. Yang, X. Jia, Q. Li, X. Yang, M. Yao, and J. Yan, “Raw2Drive: Reinforcement Learning with Aligned World Models for End- to-End Autonomous Driving (in CARLA v2),” inAdvances in Neural Information Processing Systems, vol. 38, 2025

  17. [17]

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 23–30

  18. [18]

    Sim-to-Real Transfer of Robotic Control with Dynamics Ran- domization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-Real Transfer of Robotic Control with Dynamics Ran- domization,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3803–3810

  19. [19]

    RMA: Rapid Motor Adaptation for Legged Robots,

    A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid Motor Adaptation for Legged Robots,” inRobotics: Science and Systems, 2021

  20. [20]

    Unsupervised Domain Adaptation by Backpropagation,

    Y. Ganin and V. Lempitsky, “Unsupervised Domain Adaptation by Backpropagation,” inProceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 37. PMLR, 2015, pp. 1180–1189

  21. [21]

    Invariant Risk Minimization

    M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant Risk Minimization,” Mar. 2020, arXiv:1907.02893 [stat]. [Online]. Available: http://arxiv.org/abs/1907.02893

  22. [22]

    Domain Adaptation in Reinforcement Learning via Latent Unified State Representation,

    J. Xing, T. Nagata, K. Chen, X. Zou, E. Neftci, and J. L. Krichmar, “Domain Adaptation in Reinforcement Learning via Latent Unified State Representation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10452– 10459, 2021

  23. [23]

    Domain Separation Networks,

    K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain Separation Networks,” inAdvances in Neural Information Processing Systems, vol. 29, 2016

  24. [24]

    InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets,

    X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, vol. 29, 2016

  25. [25]

    DARLA: Improving Zero-Shot Transfer in Reinforcement Learning,

    I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “DARLA: Improving Zero-Shot Transfer in Reinforcement Learning,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 70. PMLR, 2017, pp. 1480–1490

  26. [26]

    Disentangling by Factorising,

    H. Kim and A. Mnih, “Disentangling by Factorising,” inProceed- ings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 80. PMLR, 2018, pp. 2649–2658

  27. [27]

    Learning World Models with Identifiable Factoriza- tion,

    Y.-R. Liu, B. Huang, Z. Zhu, H. Tian, M. Gong, Y. Yu, and K. Zhang, “Learning World Models with Identifiable Factoriza- tion,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  28. [28]

    Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning,

    Q. Wang, Z. Zhang, B. Xie, X. Jin, Y. Wang, S. Wang, L. Zheng, X. Yang, and W. Zeng, “Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 2599–2608

  29. [29]

    Learning Transferable Visual Models From Natural Language Supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S.Agarwal,G.Sastry,A.Askell,P.Mishkin,J.Clark,G.Krueger, and I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” inProceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 139. PMLR, 2021, pp. 8748–8763

  30. [30]

    RoboCLIP: One Demonstration Is Enough to Learn Robot Policies,

    S. A. Sontakke, J. Zhang, S. M. R. Arnold, K. Pertsch, E. Bıyık, D. Sadigh, C. Finn, and L. Itti, “RoboCLIP: One Demonstration Is Enough to Learn Robot Policies,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  31. [32]

    Available: https://arxiv.org/abs/2412.16201

    [Online]. Available: https://arxiv.org/abs/2412.16201

  32. [33]

    Learning to Adapt Frozen CLIP for Few-Shot Test- Time Domain Adaptation,

    Z. Chi, L. Gu, H. Liu, Z. Wang, Y. Wu, Y. Wang, and K. N. Plataniotis, “Learning to Adapt Frozen CLIP for Few-Shot Test- Time Domain Adaptation,” inThe Thirteenth International Conference on Learning Representations, 2025

  33. [34]

    CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations,

    X. Dang, S. Edelkamp, and N. Ribault, “CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations,”arXiv preprint arXiv:2311.03485, 2023. [Online]. Available: https://arxiv.org/abs/2311.03485

  34. [35]

    CARLA Autonomous Driving Leaderboard,

    CARLA Team, “CARLA Autonomous Driving Leaderboard,”

  35. [36]

    Available: https://leaderboard.carla.org/ 16

    [Online]. Available: https://leaderboard.carla.org/ 16

  36. [37]

    Exploring the Limitations of Behavior Cloning for Autonomous Driving,

    F. Codevilla, E. Santana, A. López, and A. Gaidon, “Exploring the Limitations of Behavior Cloning for Autonomous Driving,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019, pp. 9328–9337

  37. [38]

    Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To- End Autonomous Driving,

    X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To- End Autonomous Driving,” inAdvances in Neural Information Processing Systems Datasets and Benchmarks Track, 2024

  38. [39]

    arXiv preprint arXiv:2412.09647

    J. You, X. Jia, Z. Zhang, Y. Zhu, and J. Yan, “Bench2Drive- R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model,” arXiv preprint arXiv:2412.09647, 2024. [Online]. Available: https://arxiv.org/abs/2412.09647 F eeza Khan Khanzadareceived the B.E. degree in computer systems engineering from Mehran University ...