pith. sign in

arxiv: 2606.23588 · v1 · pith:SKOQYD5Bnew · submitted 2026-06-22 · 💻 cs.RO · cs.AI

A Generative Model for Closed-Loop Microsimulation of Signalized Intersections

Pith reviewed 2026-06-26 08:23 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords generative modelclosed-loop microsimulationsignalized intersectionstransformer attentionvehicle trajectory predictiontraffic simulationSUMO
0
0 comments X

The pith

Enactor generates vehicle trajectories at signalized intersections that match a SUMO simulator's speed and travel-time distributions with over ten times lower KL divergence than a transformer baseline while cutting red-light violations by m

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Enactor, a generative model that treats each vehicle as an actor whose next motion is drawn from a learned distribution. It encodes actors and lanes in polar coordinates around the intersection center and feeds them to a transformer that separates spatial and temporal attention. Training proceeds through a closed-loop curriculum that forces the model to use its own prior outputs as input. In full simulation-in-the-loop tests lasting 4000 seconds with continuously refreshed actors, the model recovers the original data generator's distributions far more closely than a recent transformer baseline and produces far fewer safety violations. The same architecture also outperforms a constant-velocity predictor on naturalistic trajectories recorded at a real intersection.

Core claim

Enactor is an actor-centric generative model that encodes dynamic vehicles and lane polylines in polar coordinates referenced to the intersection center, uses a transformer with separate spatial and temporal attention blocks to output a distribution over each actor's next-step motion (s, α), and is trained with a closed-loop curriculum so that it learns to condition on its own predictions; when run in full closed-loop simulation-in-the-loop against a refreshing actor set it recovers the SUMO generator's speed and travel-time distributions with KL divergence more than an order of magnitude lower than a recent transformer baseline on travel time and roughly 5× lower on speed at one site, while

What carries the argument

Actor-centric transformer that predicts per-vehicle (s, α) distributions from polar-encoded actors and lanes, trained with closed-loop curriculum on its own rollouts.

If this is right

  • Enactor recovers the SUMO data generator's speed and travel-time distributions with KL divergence over an order of magnitude lower than a recent transformer baseline on travel time.
  • Enactor reduces red-light violations by more than an order of magnitude relative to the same baseline.
  • An ablation shows the leader rear-bumper feature produces the largest improvement on intersection-aware safety metrics.
  • The same architecture outperforms a constant-velocity baseline at every horizon on naturalistic vehicle trajectories from a fish-eye camera.
  • Pedestrians are treated only as context that can influence vehicle decisions but are not themselves predicted.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The polar encoding and dual-attention design may allow the same model to be fine-tuned on limited real-world data without retraining from scratch.
  • Because the model already handles a continuously refreshed actor set, it could be coupled directly to live sensor feeds for on-line traffic forecasting.
  • The closed-loop curriculum's emphasis on long-horizon stability suggests the architecture could be adapted to multi-agent settings beyond intersections, such as highway merging.
  • The ablation result on the leader rear-bumper feature indicates that explicit inclusion of immediate leader state is a high-leverage inductive bias for safety-critical motion models.

Load-bearing premise

A model trained via closed-loop curriculum on SUMO-generated data will remain stable and produce realistic long-horizon behavior when run in full simulation-in-the-loop with a refreshing actor set and when transferred to real field data.

What would settle it

Running the trained Enactor for 10 000+ seconds in closed loop on a new intersection geometry and measuring whether the KL divergence on travel-time distributions stays below the reported baseline level or begins to rise sharply.

Figures

Figures reproduced from arXiv: 2606.23588 by Anand Rangarajan, Rahul Sengupta, Sanjay Ranka, Yash Ranjan.

Figure 2
Figure 2. Figure 2: SUMO is configured with calibrated arrival rates per [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Transformer-based architecture for learning multi-actor interactions and generating trajectories. (a) SUMO initializes the traffic scene from a set of [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Digital representations of the two simulated intersections. left: SUMO representation of the intersection at West University. right: The same intersection [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Traffic video is collected through fish-eye camera at a frame rate of 10 frames per second, and processed though a data processing pipeline that [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Lateral drift from detection error is corrected by fitting a smoothing [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trajectory comparison between enactor and constant velocity baseline. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Trajectory comparison between enactor and constant velocity baselines with different prediction horizons. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Traffic microsimulators rely on hand-crafted behavior models that reproduce aggregate flow but miss the heterogeneous interactions between vehicles at signalized intersections. Learned trajectory predictors capture richer interactions but are short-horizon and tend to be unstable when run in closed loop. We present Enactor, an actor-centric generative model for closed-loop intersection microsimulation. The model focuses on vehicles; pedestrians are included as context that can influence vehicle decisions but not predicted. Dynamic actors and lane polylines are encoded in polar coordinates referenced to the intersection center. A transformer with separate spatial and temporal attention blocks predicts a distribution over each actor's next-step motion ($s$, $\alpha$). Training uses a closed-loop curriculum so the model is exposed to its own predictions. We evaluate Enactor in two regimes. In a 4000-second simulation-in-the-loop test at two intersection geometries, Enactor controls every dynamic vehicle against a continuously refreshing actor set rather than the fixed cohort that learned trajectory predictors are usually evaluated against. It recovers the SUMO data generator's speed and travel-time distributions with KL divergence over an order of magnitude lower than a recent transformer baseline on travel time, and substantially lower on speed (roughly $5\times$ lower at Site 1), and reduces red-light violations relative to the same baseline by more than an order of magnitude. An ablation isolates the leader rear-bumper feature as the change with the largest effect on intersection-aware safety metrics. We also evaluate on real-world field data and apply the same architecture to naturalistic vehicle trajectories from a fish-eye camera at a signalized intersection and evaluate it on multi-horizon predictive tasks. Enactor outperforms a constant-velocity baseline at every horizon evaluated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces Enactor, an actor-centric generative model for closed-loop microsimulation of signalized intersections. Vehicles are encoded in polar coordinates relative to the intersection center; a transformer with separate spatial and temporal attention blocks outputs a distribution over each actor's next-step motion (s, α). Training employs a closed-loop curriculum. In 4000-second simulation-in-the-loop experiments at two sites, Enactor controls every dynamic vehicle against a continuously refreshing actor set and recovers the SUMO generator's speed and travel-time distributions with KL divergence more than an order of magnitude lower than a recent transformer baseline on travel time (roughly 5× lower on speed at Site 1) while reducing red-light violations by more than an order of magnitude. An ablation identifies the leader rear-bumper feature as most impactful on safety metrics. On real-world fish-eye camera trajectories the same architecture outperforms a constant-velocity baseline on multi-horizon open-loop prediction.

Significance. If the reported gains in distributional fidelity and violation reduction hold under the closed-loop regime with refreshing actors, the work supplies a concrete, reproducible route to stable long-horizon learned microsimulation that improves upon both hand-crafted SUMO models and short-horizon trajectory predictors. The explicit separation of closed-loop SUMO evaluation from open-loop real-world evaluation, together with the leader-rear-bumper ablation, strengthens the central claim that curriculum training plus actor-centric polar encoding enables realistic intersection behavior.

minor comments (3)
  1. [Abstract, §4] Abstract and §4: the factor-of-5× speed KL improvement at Site 1 and the order-of-magnitude travel-time and violation claims should be accompanied by the exact KL values, sample sizes, and any statistical significance tests in the main results tables so that readers can assess effect magnitude directly.
  2. [§3.2] §3.2: the precise definition of the polar encoding (origin at intersection center, reference frame for α) and the handling of lane-polyline context should be stated with an equation or pseudocode; current prose leaves the coordinate transformation ambiguous for re-implementation.
  3. [Introduction, Conclusion] The manuscript states that real-world evaluation is restricted to open-loop multi-horizon prediction; this distinction from the closed-loop SUMO regime should be reiterated in the introduction and conclusion to prevent readers from conflating the two settings.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and accurate summary of our work, the positive assessment of its significance, and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript describes an empirical generative model (Enactor) trained via closed-loop curriculum on SUMO data and evaluated in 4000-second simulation-in-the-loop runs that recover the generator's speed/travel-time distributions with lower KL than a transformer baseline, plus open-loop multi-horizon prediction on independent real-world camera trajectories. The closed-loop curriculum is a standard technique for exposing the model to its own outputs and does not force the reported distribution matches or violation reductions by construction; the baseline comparison and separate real-world test supply independent content. No self-definitional equations, fitted inputs renamed as predictions, load-bearing self-citations, or imported uniqueness theorems appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard neural-network assumptions about attention mechanisms and data representation choices; no explicit free parameters beyond training are described in the abstract.

axioms (2)
  • domain assumption Transformer attention blocks can effectively capture spatial and temporal dependencies in multi-actor trajectory data
    Invoked in the model architecture description
  • domain assumption Polar coordinate encoding centered at the intersection is adequate to represent lane polylines and actor states
    Used for input representation

pith-pipeline@v0.9.1-grok · 5842 in / 1350 out tokens · 30853 ms · 2026-06-26T08:23:19.242185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages

  1. [1]

    Enactor: From traffic simulators to surrogate world models,

    Y . Ranjan, R. Sengupta, A. Rangarajan, and S. Ranka, “Enactor: From traffic simulators to surrogate world models,” inProceedings of the 12th International Conference on Vehicle Technology and Intelligent Transport Systems - VEHITS, INSTICC. SciTePress, 2026, pp. 430– 438

  2. [2]

    Limitations of current traffic models and strategies to address them,

    D. Ni, “Limitations of current traffic models and strategies to address them,”Simulation Modelling Practice and Theory, vol. 104, p. 102137, 2020. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1569190X20300769

  3. [3]

    Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,

    T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,”

  4. [4]

    Available: https://arxiv.org/abs/2001.03093

    [Online]. Available: https://arxiv.org/abs/2001.03093

  5. [5]

    Motion transformer with global intention localization and local movement refinement,

    S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,” 2023. [Online]. Available: https://arxiv.org/abs/2209.13508

  6. [6]

    Trafficbots: Towards world models for autonomous driving simulation and motion prediction,

    L. Zhang, J. Gao, W. Liet al., “Trafficbots: Towards world models for autonomous driving simulation and motion prediction,” inIEEE Intl. Conf. on Robotics and Automation (ICRA), 2023

  7. [7]

    Social LSTM: Human trajectory prediction in crowded spaces,

    A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” inIEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2016, pp. 961–971. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 12

  8. [8]

    Convolutional social pooling for vehicle trajectory prediction,

    N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle trajectory prediction,” inIEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 1468–1476

  9. [9]

    Social GAN: Socially acceptable trajectories with generative adversarial net- works,

    A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial net- works,” inIEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2018, pp. 2255–2264

  10. [10]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

  11. [11]

    Attention based vehicle trajectory prediction,

    K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, “Attention based vehicle trajectory prediction,”IEEE Transactions on Intelligent Vehicles, vol. 6, no. 1, pp. 175–185, Mar. 2021

  12. [12]

    AI-TP: Attention- based interaction-aware trajectory prediction for autonomous driving,

    K. Zhang, L. Zhao, C. Dong, L. Wu, and L. Zheng, “AI-TP: Attention- based interaction-aware trajectory prediction for autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 73–83, Jan. 2023

  13. [13]

    Intention- aware vehicle trajectory prediction based on spatial-temporal dynamic attention network for internet of vehicles,

    X. Chen, H. Zhang, F. Zhao, Y . Hu, C. Tan, and J. Yang, “Intention- aware vehicle trajectory prediction based on spatial-temporal dynamic attention network for internet of vehicles,”IEEE Transactions on Intel- ligent Transportation Systems, vol. 23, no. 10, pp. 19 471–19 483, Oct. 2022

  14. [14]

    Adaptive and simultaneous trajectory prediction for heterogeneous agents via transferable hierarchical transformer network,

    M. Geng, J. Li, C. Li, N. Xie, X. Chen, and D.-H. Lee, “Adaptive and simultaneous trajectory prediction for heterogeneous agents via transferable hierarchical transformer network,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 10, pp. 11 479–11 492, Oct. 2023

  15. [15]

    Hdgt: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding,

    X. Jia, P. Wu, L. Chenet al., “Hdgt: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 2023

  16. [16]

    Bayesian network for red-light-running prediction at signalized intersections,

    X. Chen, L. Zhou, and L. Li, “Bayesian network for red-light-running prediction at signalized intersections,”Journal of Intelligent Transporta- tion Systems, vol. 23, no. 2, pp. 120–132, Mar. 2019

  17. [17]

    Vehicle motion prediction at intersections based on the turning intention and prior trajectories model,

    T. Zhang, W. Song, M. Fu, Y . Yang, and M. Wang, “Vehicle motion prediction at intersections based on the turning intention and prior trajectories model,”IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 10, pp. 1657–1666, Oct. 2021

  18. [18]

    A hierarchical vehicle behavior prediction framework with traffic signals and interactive agents,

    Z. Yang, R. Zhang, G. Pandey, N. Masoud, and H. X. Liu, “A hierarchical vehicle behavior prediction framework with traffic signals and interactive agents,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 10, pp. 11 066–11 079, Oct. 2023

  19. [19]

    Impact of traffic lights on trajectory forecasting of human-driven vehicles near signalized intersections,

    G. Oh and H. Peng, “Impact of traffic lights on trajectory forecasting of human-driven vehicles near signalized intersections,” 2019

  20. [20]

    D2-TPred: Discontinuous dependency for trajectory prediction under traffic lights,

    Y . Zhang, W. Wang, W. Guo, P. Lv, M. Xu, W. Chen, and D. Manocha, “D2-TPred: Discontinuous dependency for trajectory prediction under traffic lights,” inEuropean Conference on Computer Vision (ECCV), 2022, pp. 522–539

  21. [21]

    Ki-gan: Knowledge-informed generative adversarial networks for vehicle trajectory prediction at signalized intersections,

    C. Wei, L. Zhang, W. Liet al., “Ki-gan: Knowledge-informed generative adversarial networks for vehicle trajectory prediction at signalized intersections,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024

  22. [22]

    Multi-modal interaction-aware motion prediction at unsignalized intersections,

    V . Trentin, A. Artu ˜nedo, J. Godoy, and J. Villagra, “Multi-modal interaction-aware motion prediction at unsignalized intersections,”IEEE Transactions on Intelligent Vehicles, vol. 8, no. 5, pp. 3349–3365, May 2023

  23. [23]

    MMH-STA: A macro-micro- hierarchical spatio-temporal attention method for multi-agent trajectory prediction in unsignalized roundabouts,

    Y . Sun, T. Xu, J. Li, Y . Chu, and X. Ji, “MMH-STA: A macro-micro- hierarchical spatio-temporal attention method for multi-agent trajectory prediction in unsignalized roundabouts,”IEEE Transactions on Vehicu- lar Technology, vol. 72, no. 9, pp. 11 237–11 250, Sep. 2023

  24. [24]

    Vehicle trajectory prediction at intersections using interaction based generative adversar- ial networks,

    D. Roy, T. Ishizaka, C. K. Mohan, and A. Fukuda, “Vehicle trajectory prediction at intersections using interaction based generative adversar- ial networks,” inIEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 2318–2323

  25. [25]

    A data- driven approach for probabilistic traffic prediction and simulation at signalized intersections,

    A. Wu, Y . Ranjan, R. Sengupta, A. Rangarajan, and S. Ranka, “A data- driven approach for probabilistic traffic prediction and simulation at signalized intersections,” in2024 IEEE Intelligent Vehicles Symposium (IV), 2024, pp. 3092–3099

  26. [26]

    Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,

    S. Ettinger, A. Timofeev, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4378–4387

  27. [27]

    Building transportation foundation model via generative graph transformer,

    X. Wang, D. Wang, L. Chen, and Y . Lin, “Building transportation foundation model via generative graph transformer,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14826

  28. [28]

    Transworldng: Traffic simulation via foundation model,

    Y . Zhang, X. Chen, Y . Liuet al., “Transworldng: Traffic simulation via foundation model,” inIEEE Intl. Conf. on Smart Computing (SMART- COMP), 2023

  29. [29]

    Scenediffuser++: City-scale traffic simulation via a generative world model,

    S. Tan, J. Lambert, H. Jeon, S. Kulshrestha, Y . Bai, J. Luo, D. Anguelov, M. Tan, and C. M. Jiang, “Scenediffuser++: City-scale traffic simulation via a generative world model,” 2025. [Online]. Available: https://arxiv.org/abs/2506.21976

  30. [30]

    Beyond simulation: Benchmarking world models for planning and causality in autonomous driving,

    K. Zheng, J. Gao, W. Liet al., “Beyond simulation: Benchmarking world models for planning and causality in autonomous driving,”arXiv preprint arXiv:2508.01922, 2025

  31. [31]

    Inttrajsim: Trajectory prediction for simulating multi-vehicle driving at signalized intersections,

    Y . Ranjan, R. Sengupta, A. Rangarajan, and S. Ranka, “Inttrajsim: Trajectory prediction for simulating multi-vehicle driving at signalized intersections,” 2025. [Online]. Available: https://arxiv.org/abs/2506. 08957

  32. [32]

    Evaluating generative vehicle trajectory models for nbsp;traffic intersection dynamics,

    ——, “Evaluating generative vehicle trajectory models for nbsp;traffic intersection dynamics,” inData Science: Foundations and Applications: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10-13, 2025, Proceedings, Part VI. Berlin, Heidelberg: Springer-Verlag, 2025, p. 262–274. [Online]. Availab...

  33. [33]

    Beyond centrality: Understanding urban street network typologies through intersection patterns,

    A. Kuncheria, J. L. Walker, and J. Macfarlane, “Beyond centrality: Understanding urban street network typologies through intersection patterns,” 2025. [Online]. Available: https://arxiv.org/abs/2511.06747

  34. [34]

    The waymo open sim agents challenge,

    N. Montali, J. Lambert, P. Mougin, A. Kuefler, N. Rhinehart, M. Li, C. Gulino, T. Emrich, Z. Yang, S. Whiteson, B. White, and D. Anguelov, “The waymo open sim agents challenge,” 2023. [Online]. Available: https://arxiv.org/abs/2305.12032

  35. [35]

    Relative position matters: Trajectory prediction and planning with polar representation,

    B. Zhang, N. Song, B. Gao, and L. Zhang, “Relative position matters: Trajectory prediction and planning with polar representation,” 2025. [Online]. Available: https://arxiv.org/abs/2508.11492

  36. [36]

    Vectornet: Encoding hd maps and agent dynamics from vectorized representation,

    J. Gao, C. Sun, H. Zhao, Y . Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” 2020. [Online]. Available: https: //arxiv.org/abs/2005.04259

  37. [37]

    Machine-learning-based real-time multi-camera vehicle tracking and travel-time estimation,

    X. Huang, P. He, A. Rangarajan, and S. Ranka, “Machine-learning-based real-time multi-camera vehicle tracking and travel-time estimation,” Journal of Imaging, vol. 8, no. 4, 2022. [Online]. Available: https://www.mdpi.com/2313-433X/8/4/101 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13 (a): Prediction Horizon 20 (2 s) (b): Prediction Horizon 5...