ECoSim: Data Efficient Fine-Tuning for Controllable Traffic Simulation
Pith reviewed 2026-07-02 15:04 UTC · model grok-4.3
The pith
Lightweight adaptation adds multi-modal control to pretrained traffic simulators using under 1% of paired data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By modulating intermediate features through identity-initialized FiLM layers, the method efficiently adds new control modalities while preserving the base model's generative prior. On the Waymo Open Sim Agents Challenge it achieves strong controllability with less than 1% of the paired control data, and context-aware condition transfer enables counterfactual scenario generation and long-tail synthesis while maintaining stable closed-loop driving realism and safety.
What carries the argument
identity-initialized FiLM layers that modulate intermediate features of pretrained diffusion and autoregressive models to incorporate new control signals
If this is right
- Multi-modal controllability becomes available through sketch, latent behavior codes, and text inputs.
- Counterfactual scenario generation is enabled via context-aware condition transfer.
- Long-tail event synthesis is supported without retraining the full model.
- Closed-loop driving realism and safety metrics remain stable after adaptation.
Where Pith is reading between the lines
- The method could be applied to other pretrained generative models in simulation domains beyond traffic.
- Reduced data requirements may allow faster iteration on scenario libraries for autonomous driving validation.
- Condition transfer techniques might extend to mixing controls across different model architectures.
Load-bearing premise
Inserting identity-initialized FiLM layers into intermediate features of pretrained models does not meaningfully degrade the base generative prior or closed-loop realism.
What would settle it
If the adapted model produces lower closed-loop realism or safety scores than the unmodified base model when both are evaluated on the Waymo Open Sim Agents Challenge benchmark, the preservation of the generative prior would be falsified.
Figures
read the original abstract
Controllable traffic simulation is critical for testing autonomous driving systems, yet existing approaches often require retraining large generative models with extensive annotated data. We introduce a lightweight control adaptation framework that enables multi-modal controllability (sketch, latent behavior codes, and text) for pretrained state-of-the-art diffusion and autoregressive traffic models. By modulating intermediate features through identity-initialized FiLM layers, our method efficiently adds new control modalities while preserving the base model's generative prior. Evaluated on Waymo Open Sim Agents Challenge, our approach demonstrates strong controllability with less than 1% of the paired control data. Through context-aware condition transfer, our framework enables counterfactual scenario generation and long-tail synthesis while maintaining stable closed-loop driving realism and safety. Our framework unlocks new possibilities for controllable traffic simulation, enabling targeted scenario generation through lightweight adaptation of pretrained generative models. Project page: https://ecosim-web.github.io/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ECoSim, a lightweight adaptation method that inserts identity-initialized FiLM layers into intermediate features of pretrained diffusion and autoregressive traffic generators. This enables multi-modal control (sketch, latent codes, text) using <1% paired control data on the Waymo Open Sim Agents Challenge, while supporting counterfactual and long-tail scenario generation and preserving closed-loop realism and safety.
Significance. If the preservation of the base generative prior and closed-loop metrics is quantitatively verified, the result would meaningfully advance data-efficient controllability for traffic simulation, reducing reliance on large annotated datasets for testing autonomous driving systems.
major comments (2)
- [Abstract, §3] Abstract and §3 (method description): the central claim that identity-initialized FiLM layers preserve the original generative prior after fine-tuning on <1% data is load-bearing, yet no quantitative comparison is provided (e.g., collision rate, realism score, or distribution distance) between the adapted model evaluated at the identity control mapping and the untouched pretrained baseline.
- [§4] §4 (experiments): the reported controllability gains and closed-loop safety metrics must be accompanied by an explicit ablation showing that the same metrics remain statistically indistinguishable from the base model when control inputs are set to zero/identity; without this, the preservation assertion cannot be assessed.
minor comments (2)
- [Abstract] The abstract states performance claims without any numerical values; move at least the key controllability and realism numbers into the abstract for immediate readability.
- [§3] Clarify the exact definition of the identity initialization for the FiLM scale and shift parameters and confirm whether any regularization is applied during the <1% fine-tuning to limit drift from the base feature statistics.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing the need for explicit quantitative verification of generative prior preservation. We agree this is a load-bearing claim and will add the requested comparisons and ablation in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (method description): the central claim that identity-initialized FiLM layers preserve the original generative prior after fine-tuning on <1% data is load-bearing, yet no quantitative comparison is provided (e.g., collision rate, realism score, or distribution distance) between the adapted model evaluated at the identity control mapping and the untouched pretrained baseline.
Authors: We acknowledge that the manuscript does not currently include direct quantitative comparisons (collision rate, realism score, distribution distance) between the adapted model at identity mapping and the untouched baseline. Although the design (identity initialization plus <1% data) is intended to preserve the prior, this evidence is missing. We will add these comparisons to §4 in the revision. revision: yes
-
Referee: [§4] §4 (experiments): the reported controllability gains and closed-loop safety metrics must be accompanied by an explicit ablation showing that the same metrics remain statistically indistinguishable from the base model when control inputs are set to zero/identity; without this, the preservation assertion cannot be assessed.
Authors: We agree that an explicit ablation demonstrating statistical indistinguishability on closed-loop metrics when controls are set to identity/zero is required. We will insert this ablation into §4, reporting the relevant metrics and adding statistical comparisons against the base model. revision: yes
Circularity Check
No circularity detected; claims rest on external evaluation
full rationale
The provided abstract and description contain no equations, derivations, or self-referential steps that reduce the controllability claim or preservation of the generative prior to a fitted quantity defined by the method itself. The framework is described as an empirical adaptation technique evaluated on the external Waymo Open Sim Agents Challenge benchmark, with no load-bearing self-citations, uniqueness theorems, or ansatzes invoked within the text. This is the expected non-finding for a methods paper lacking visible mathematical chains.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: ICRA (2021)
Bergamini, L., Ye, Y., Scheel, O., Chen, L., Hu, C., Del Pero, L., Osiński, B., Grimmett, H., Ondruska, P.: Simnet: Learning reactive self-driving simulations from real-world observations. In: ICRA (2021)
2021
-
[2]
In: ECCV (2024)
Chang, W.J., Pittaluga, F., Tomizuka, M., Zhan, W., Chandraker, M.: Safe-sim: Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries. In: ECCV (2024)
2024
-
[3]
In: ICCV (2025)
Chang, W.J., Zhan, W., Tomizuka, M., Chandraker, M., Pittaluga, F.: Langtraj: Diffusion model and dataset for language-conditioned trajectory simulation. In: ICCV (2025)
2025
-
[4]
In: ICRA (2026)
Chen, P.L., Kung, C.H., Chang, C.H., Chiu, W.C., Chen, Y.T.: Controllable col- lision scenario generation via collision pattern prediction. In: ICRA (2026)
2026
-
[5]
In: NeurIPS (2024)
Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., Geiger, A., Chitta, K.: Navsim: Data- driven non-reactive autonomous vehicle simulation and benchmarking. In: NeurIPS (2024)
2024
-
[6]
In: ECCV (2024)
Ding, W., Cao, Y., Zhao, D., Xiao, C., Pavone, M.: Realgen: Retrieval augmented generation for controllable traffic scenarios. In: ECCV (2024)
2024
-
[7]
In: ICCV (2021)
Ettinger, S., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., Chai, Y., Sapp, B., Qi, C.R., Zhou, Y., Yang, Z., Chouard, A., Sun, P., Ngiam, J., Vasudevan, V., McCauley, A., Shlens, J., Anguelov, D.: Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. In: ICCV (2021)
2021
-
[8]
In: ICRA (2023)
Feng, L., Li, Q., Peng, Z., Tan, S., Zhou, B.: Trafficgen: Learning to generate diverse and realistic traffic scenarios. In: ICRA (2023)
2023
-
[9]
In: NeurIPS (2023)
Gulino, C., Fu, J., Luo, W., Tucker, G., Bronstein, E., Lu, Y., Harb, J., Pan, X., Wang, Y., Chen, X., Co-Reyes, J.D., Agarwal, R., Roelofs, R., Lu, Y., Montali, N., Mougin, P., Yang, Z., White, B., Faust, A., McAllister, R., Anguelov, D., Sapp, B.: Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. In: NeurIPS (2023)
2023
-
[10]
Classifier-Free Diffusion Guidance
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[11]
In: ICLR (2022)
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: ICLR (2022)
2022
-
[12]
arXiv preprint arXiv:2404.02524 , volume=
Huang, Z., Zhang, Z., Vaidya, A., Chen, Y., Lv, C., Fisac, J.F.: Versatile behavior diffusion for generalized traffic agent simulation. arXiv preprint arXiv:2404.02524 (2024)
-
[13]
In: ICRA (2022)
Igl, M., Kim, D., Kuefler, A., Mougin, P., Shah, P., Shiarlis, K., Anguelov, D., Palatucci, M., White, B., Whiteson, S.: Symphony: Learning realistic and diverse agents for autonomous driving simulation. In: ICRA (2022)
2022
-
[14]
In: NeurIPS (2024)
Jiang, C.M., Bai, Y., et al.: Scenediffuser: Efficient and controllable driving simu- lation initialization and rollout. In: NeurIPS (2024)
2024
-
[15]
In: ICLR (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
2019
-
[16]
Journal of machine learning research (2008)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research (2008)
2008
-
[17]
In: NeurIPS (2023)
Montali, N., Lambert, J., Mougin, P., Boone, A., Boulton, P., Lu, Y., Devin, C., Huguet, R., Dasari, J., Sapp, B., et al.: The waymo open sim agents challenge. In: NeurIPS (2023)
2023
-
[18]
In: AAAI (2018) Data Efficient Fine-Tuning for Controllable Traffic Simulation 17
Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A.C.: Film: Visual reasoning with a general conditioning layer. In: AAAI (2018) Data Efficient Fine-Tuning for Controllable Traffic Simulation 17
2018
-
[19]
In: CVPR (2022)
Rempe, D., Philion, J., Guibas, L.J., Fidler, S., Litany, O.: Generating useful accident-prone driving scenarios via a learned traffic prior. In: CVPR (2022)
2022
-
[20]
In: CVPR (2025)
Rowe, L., Girgis, R., Gosselin, A., Paull, L., Pal, C., Heide, F.: Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments. In: CVPR (2025)
2025
-
[21]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[22]
In: CoRL (2024)
Tan, S., Ivanovic, B., Chen, Y., Li, B., Weng, X., Cao, Y., Krähenbühl, P., Pavone, M.: Promptable closed-loop traffic simulation. In: CoRL (2024)
2024
-
[23]
In: CoRL (2023)
Tan, S., Ivanovic, B., Weng, X., Pavone, M., Krähenbühl, P.: Language conditioned traffic generation. In: CoRL (2023)
2023
-
[24]
In: NeurIPS (2024)
Wu, W., Feng, X., Gao, Z., Kan, Y.: Smart: Scalable multi-agent real-time motion generation via next-token prediction. In: NeurIPS (2024)
2024
-
[25]
In: ICRA (2023)
Xu, D., Chen, Y., Ivanovic, B., Pavone, M.: Bits: Bi-level imitation for traffic simulation. In: ICRA (2023)
2023
-
[26]
In: ICCV (2023)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)
2023
-
[27]
In: CVPR (2025)
Zhang, Z., Karkus, P., Igl, M., Ding, W., Chen, Y., Ivanovic, B., Pavone, M.: Closed-loop supervised fine-tuning of tokenized traffic models. In: CVPR (2025)
2025
-
[28]
Zhao, T., Zhao, L., Eskenazi, M., Black, A.W.: Learning discourse-level diversity forneuraldialogmodelsusingconditionalvariationalautoencoders.In:ACL(2017)
2017
-
[29]
In: CoRL (2023)
Zhong, Z., Rempe, D., Chen, Y., Ivanovic, B., Cao, Y., Xu, D., Pavone, M., Ray, B.: Language-guided traffic simulation via scene-level diffusion. In: CoRL (2023)
2023
-
[30]
In: ICRA (2023)
Zhong, Z., Rempe, D., Xu, D., Chen, Y., Veer, S., Che, T., Ray, B., Pavone, M.: Guided conditional diffusion for controllable traffic simulation. In: ICRA (2023)
2023
-
[31]
Make vehicle 1, vehicle 2... and the ego vehicle remain parked for the entire simulation
Zhou, Z., Hu, H., Chen, X., Wang, J., Guan, N., Wu, K., Li, Y.H., Huang, Y.K., Xue, C.J.: Behaviorgpt: Smart agent simulation for autonomous driving with next- patch prediction. In: NeurIPS (2024) 18 Y.-H. Chen et al. Supplementary Material ECoSim: Data Efficient Fine-Tuning for Controllable Traffic Simulation Compared to prior work, our approach enables ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.