pith. sign in

arxiv: 2505.20414 · v2 · submitted 2025-05-26 · 💻 cs.CV · cs.AI· cs.RO

RetroMotion: Retrocausal Motion Forecasting Models are Instructable

Pith reviewed 2026-05-19 12:39 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.RO
keywords motion forecastingmulti-agent predictionretrocausal modelingtransformerinstruction followingjoint trajectory distributiontrajectory predictionautonomous driving
0
0 comments X

The pith

Transformer motion models generate joint agent trajectories via retrocausal re-encoding of marginals and implicitly follow user instructions after standard training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-agent motion forecasts can be decomposed into marginal distributions for individual agents and joint distributions for interacting ones. A transformer generates the joints by re-encoding the marginal trajectories and then applying pairwise modeling, which creates a retrocausal flow passing information from later marginal points back to earlier joint points. Positional uncertainty at each step is captured with compressed exponential power distributions. If this holds, forecasting systems for road users would produce accurate interacting trajectories while also accepting and adapting to high-level instructions without any special instruction-tuning stage. A sympathetic reader cares because this turns a passive predictor into one that can be guided on the fly in complex scenes.

Core claim

Using a transformer model, joint distributions are generated by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, positional uncertainty is modeled using compressed exponential power distributions. The resulting models achieve strong results on the Waymo Interaction Prediction Challenge, generalize to Argoverse 2 and V2X-Seq, and follow instructions that adapt to scene context after ordinary motion-forecasting training.

What carries the argument

Retrocausal flow created by re-encoding marginal trajectory distributions then performing pairwise joint modeling inside the transformer.

Load-bearing premise

Re-encoding marginal distributions and performing only pairwise modeling is sufficient to capture necessary multi-agent interactions for both accurate joints and instruction following.

What would settle it

A controlled experiment in which the model receives an explicit instruction to alter behavior yet produces joint trajectories that ignore the instruction or violate scene constraints would show the implicit instructability claim is false.

Figures

Figures reproduced from arXiv: 2505.20414 by Abhishek Vivekanandan, Carlos Fernandez, Christoph Stiller, Dominik Strutz, Felix Hauser, Jaime Villa, Marlon Steiner, Omer Sahin Tas, Royden Wagner, Yinzhe Shen.

Figure 1
Figure 1. Figure 1: From marginal to joint trajectory distributions. Left part: We use an MLP to generate query matrices Q from marginal trajectories and exchange information between queries and scene context. Scene context representations are learned by our scene encoder (see Section 3.5). Right part: Afterwards, we decode joint trajectories P joint 1:T from pairs of queries at the same index for both agents ((1, 1),(2, 2), … view at source ↗
Figure 2
Figure 2. Figure 2: Joint and marginal motion forecasts of our model. Dynamic agents are shown in blue, static agents in grey (determined at t = 0 s). Lanes are black lines, road markings are white lines, and traffic light states are shown as colored spheres. (a) and (b): Top1 mode of joint motion forecasts on the Waymo Open Motion and Argoverse 2 datasets. (c) Marginal forecasts on the V2X-Seq dataset. (d) 6 modes of a margi… view at source ↗
Figure 3
Figure 3. Figure 3: Adapting a basic turn left instruction to the scene context. The upper plot shows the default marginal trajectory fore￾cast of our model. The middle plot shows our basic turn left in￾structions, which violate traffic rules by turning into the oncoming lanes. The lower plot shows that our model responds to this in￾struction by adapting the trajectory of the right vehicle to its lane (shown as black line) an… view at source ↗
Figure 4
Figure 4. Figure 4: Mixture weight of normal components in exponential power distributions (see Equation (2)). The weight w progres￾sively increases, reaching higher values for joint trajectory distri￾butions than for marginal ones. However, w remains below 0.15, indicating that the learned distributions are Laplace-like. Initially, the weight is close to 0, because the negative log-likelihood (NLL) of a normal distribution i… view at source ↗
Figure 5
Figure 5. Figure 5: Neural regression collapse for motion forecasting. We measure the NRC1 metric for feature vectors of marginal and joint trajectory distributions. There is an immediate collapse in the upper plot, but none in the lower plot. Therefore, the true dimen￾sionality lies between 32 and 272. This suggests that other density parameters besides the 32 location parameters are important. 5. Conclusion In this work, we… view at source ↗
read the original abstract

Motion forecasts of road users (i.e., agents) vary in complexity depending on the number of agents, scene constraints, and interactions. In particular, the output space of joint trajectory distributions grows exponentially with the number of agents. Therefore, we decompose multi-agent motion forecasts into (1) marginal distributions for all modeled agents and (2) joint distributions for interacting agents. Using a transformer model, we generate joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, we model the positional uncertainty using compressed exponential power distributions. Notably, our method achieves strong results in the Waymo Interaction Prediction Challenge and generalizes well to the Argoverse 2 and V2X-Seq datasets. Additionally, our method provides an interface for issuing instructions. We show that standard motion forecasting training implicitly enables the model to follow instructions and adapt them to the scene context. GitHub repository: https://github.com/kit-mrt/future-motion

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces RetroMotion, a transformer-based approach to multi-agent motion forecasting that decomposes the task into per-agent marginal trajectory distributions and pairwise joint distributions for interacting agents. Joints are produced by re-encoding the marginals and applying pairwise modeling that injects retrocausal information from later marginal timesteps into earlier joint timesteps. Positional uncertainty at each timestep is represented by compressed exponential power distributions. The method reports competitive performance on the Waymo Interaction Prediction Challenge, cross-dataset generalization to Argoverse 2 and V2X-Seq, and an emergent ability to follow natural-language instructions after standard training.

Significance. If the reported empirical gains prove robust, the decomposition plus retrocausal re-encoding offers a practical route to scaling joint forecasting without enumerating the full exponential joint space. The public repository aids reproducibility. The instructability result, if confirmed, would be a useful side-benefit of standard training regimes. Significance is currently limited by the absence of error bars, detailed ablations on interaction order, and explicit baseline tables in the abstract.

major comments (2)
  1. [Method / Joint distribution construction] The central modeling choice—re-encoding marginals followed by pairwise joint modeling—must be shown to capture higher-order (3+-agent) interactions that cannot be factored into pairs. The abstract and method description give no explicit validation or ablation on scenes containing simultaneous three-or-more-agent constraints; if such groups are simply ignored or approximated, the joint-distribution claim is load-bearing and requires supporting experiments or theoretical justification.
  2. [Experiments / Waymo results] Abstract and experimental claims rest on “strong results” and “good generalization” without reported error bars, standard-deviation across seeds, or side-by-side numerical tables against published baselines. This prevents assessment of whether the retrocausal component or the distributional choice actually drives the gains.
minor comments (2)
  1. [Method / Uncertainty modeling] Define the precise parameterization and fitting procedure for the compressed exponential power distributions; the current description leaves the number of free parameters and any scene-dependent conditioning unclear.
  2. [Experiments] Add a short table or paragraph listing the exact baseline methods and their scores on the same Waymo Interaction Prediction Challenge split used for the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying our approach and outlining revisions that will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method / Joint distribution construction] The central modeling choice—re-encoding marginals followed by pairwise joint modeling—must be shown to capture higher-order (3+-agent) interactions that cannot be factored into pairs. The abstract and method description give no explicit validation or ablation on scenes containing simultaneous three-or-more-agent constraints; if such groups are simply ignored or approximated, the joint-distribution claim is load-bearing and requires supporting experiments or theoretical justification.

    Authors: We agree that higher-order interactions represent an important consideration. Our decomposition into marginals and pairwise joints is explicitly presented as a scalable approximation to the full joint distribution, which grows exponentially with agent count. The retrocausal re-encoding is intended to allow interaction information to propagate across timesteps even within this pairwise structure. To directly address the concern, the revised manuscript will include a new ablation evaluating performance on data subsets containing three or more simultaneously interacting agents, together with a discussion of the approximation's empirical behavior and theoretical motivation. revision: yes

  2. Referee: [Experiments / Waymo results] Abstract and experimental claims rest on “strong results” and “good generalization” without reported error bars, standard-deviation across seeds, or side-by-side numerical tables against published baselines. This prevents assessment of whether the retrocausal component or the distributional choice actually drives the gains.

    Authors: We acknowledge that the current presentation would benefit from greater statistical detail. In the revision we will report standard deviations computed across multiple random seeds for the primary Waymo metrics and will add an explicit side-by-side numerical table in both the abstract and results section that directly compares our method against the published baseline numbers using the official challenge metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling choices validated on external benchmarks

full rationale

The paper presents an architectural construction for multi-agent motion forecasting: decomposing forecasts into per-agent marginal distributions and selected pairwise joint distributions, then generating the joints via re-encoding of marginal trajectories followed by pairwise transformer modeling that injects retrocausal information. Performance is reported on external public challenges and datasets (Waymo Interaction Prediction Challenge, Argoverse 2, V2X-Seq) with no equations shown that reduce the claimed joint distributions or benchmark scores to quantities defined solely by the model's own fitted parameters. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text; the approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that pairwise joints after marginal re-encoding suffice for interaction modeling and on standard transformer training dynamics for the instruction property; no new physical entities are postulated.

free parameters (1)
  • parameters of compressed exponential power distributions
    Used per time step to model positional uncertainty; specific values or fitting procedure not stated in abstract.
axioms (1)
  • domain assumption Decomposition of joint trajectory distributions into marginals plus pairwise interactions is sufficient to represent multi-agent scene dynamics
    Invoked when the paper states it decomposes forecasts into marginal distributions for all agents and joint distributions for interacting agents.

pith-pipeline@v0.9.0 · 5750 in / 1359 out tokens · 39655 ms · 2026-05-19T12:39:08.145165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling

    cs.RO 2026-05 unverdicted novelty 5.0

    CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.

  2. Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling

    cs.RO 2026-05 unverdicted novelty 5.0

    CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    The prevalence of neural collapse in neural multivariate regression

    George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, and Keith Ross. The prevalence of neural collapse in neural multivariate regression. InNeurIPS, 2025. 4, 8

  2. [2]

    Forecasting sequential data using con- sistent koopman autoencoders

    Omri Azencot, N Benjamin Erichson, Vanessa Lin, and Michael Mahoney. Forecasting sequential data using con- sistent koopman autoencoders. InICML, 2020. 2

  3. [3]

    Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting

    Inhwan Bae, Jean Oh, and Hae-Gon Jeon. Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting. InICCV, 2023. 2

  4. [4]

    Mixture density networks

    Christopher M Bishop. Mixture density networks. 1994. 3

  5. [5]

    Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

    Mohamed-Khalil Bouzidi, Bojan Derajic, Daniel Goehring, and Joerg Reichardt. Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

  6. [6]

    Implicit latent variable model for scene-consistent motion forecasting

    Sergio Casas, Cole Gulino, Simon Suo, Katie Luo, Renjie Liao, and Raquel Urtasun. Implicit latent variable model for scene-consistent motion forecasting. InECCV, 2020. 2

  7. [7]

    Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction

    Yuning Chai, Benjamin Sapp, Mayank Bansal, and Dragomir Anguelov. Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction. InCoRL, 2020. 3

  8. [8]

    History aware multimodal transformer for vision-and-language navigation

    Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, and Ivan Laptev. History aware multimodal transformer for vision-and-language navigation. InNeurIPS, 2021. 2

  9. [9]

    Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders

    Jie Cheng, Xiaodong Mei, and Ming Liu. Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. InICCV, 2023. 2

  10. [10]

    Gorela: Go relative for viewpoint-invariant motion forecasting

    Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. InICRA, 2023. 1

  11. [11]

    Bert: Pre-training of deep bidirectional trans- formers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. InNAACL, 2019. 2

  12. [12]

    Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset. InICCV, 2021. 4, 5, 6

  13. [13]

    A review of sparse expert models in deep learning

    William Fedus, Jeff Dean, and Barret Zoph. A review of sparse expert models in deep learning.arXiv preprint arXiv:2209.01667, 2022. 4

  14. [14]

    Unitraj: A unified framework for scalable vehicle trajectory prediction

    Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehicle trajectory prediction. InECCV, 2024. 5

  15. [15]

    Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation

    Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation. InCVPR, 2020. 3

  16. [16]

    An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023

    Maximilian Geisslinger, Franziska Poszler, and Markus Lienkamp. An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023. 1, 2

  17. [17]

    Thomas: Trajectory heatmap output with learned multi-agent sampling

    Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bog- dan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. InICLR,

  18. [18]

    Latent variable sequential set transformers for joint multi-agent motion prediction,

    Roger Girgis, Florian Golemo, Felipe Codevilla, Martin Weiss, Jim Aldon D’Souza, Samira Ebrahimi Kahou, Felix Heide, and Christopher Pal. Latent variable sequential set transformers for joint multi-agent motion prediction.arXiv preprint arXiv:2104.00563, 2021. 2, 5

  19. [19]

    Instruction-driven history-aware policies for robotic manip- ulations

    Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, Ivan Laptev, and Cordelia Schmid. Instruction-driven history-aware policies for robotic manip- ulations. InCoRL, 2023. 2

  20. [20]

    Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012

    Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012. 1

  21. [21]

    Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

    Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InICCV, 2023. 5

  22. [22]

    Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction

    Ronny Hug, Wolfgang H ¨ubner, and Michael Arens. Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction. InAAAI, 2020. 2

  23. [23]

    Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991

    Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 4

  24. [24]

    Motiondiffuser: Controllable multi-agent motion prediction using diffusion

    Chiyu Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, Dragomir Anguelov, et al. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. InCVPR, 2023. 1, 2, 4, 5

  25. [25]

    Openvla: An open-source vision-language-action model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. Openvla: An open-source vision-language-action model. InCoRL, 2024. 2

  26. [26]

    Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022

    Stepan Konev. Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022. 4

  27. [27]

    SEPT: Towards efficient scene represen- tation learning for motion prediction

    Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, and Shengbo Eben Li. SEPT: Towards efficient scene represen- tation learning for motion prediction. InICLR, 2024. 2

  28. [28]

    Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016

    Stefan Lee, Senthil Purushwalkam Shiva Prakash, Michael Cogswell, Viresh Ranjan, David Crandall, and Dhruv Batra. Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016. 1

  29. [29]

    Reasoning multi-agent behavioral topology for interac- tive autonomous driving

    Haochen Liu, Li Chen, Yu Qiao, Chen Lv, and Hongyang Li. Reasoning multi-agent behavioral topology for interac- tive autonomous driving. InNeurIPS, 2024. 5

  30. [30]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. InICLR, 2019. 4

  31. [31]

    Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving

    Wenjie Luo, Cheol Park, Andre Cornman, Benjamin Sapp, and Dragomir Anguelov. Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving. In Conference on Robot Learning, 2023. 2, 5

  32. [32]

    Learning trajectory dependencies for human motion pre- diction

    Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion pre- diction. InICCV, 2019. 2

  33. [33]

    Wayformer: Motion forecasting via simple & efficient attention networks

    Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. InIEEE International Conference on Robotics and Automa- tion (ICRA), 2023. 1, 2, 3, 5

  34. [34]

    Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents

    Jiquan Ngiam, Vijay Vasudevan, Benjamin Caine, Zheng- dong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, et al. Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents. InICLR, 2022. 1, 2, 5, 6

  35. [35]

    Episodic transformer for vision-and-language navigation

    Alexander Pashevich, Cordelia Schmid, and Chen Sun. Episodic transformer for vision-and-language navigation. In ICCV, 2021. 2

  36. [36]

    Scaling instructable agents across many simulated worlds

    Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, et al. Scal- ing instructable agents across many simulated worlds.arXiv preprint arXiv:2404.10179, 2024. 2

  37. [37]

    Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs

    Luke Rowe, Martin Ethier, Eli-Henry Dykhne, and Krzysztof Czarnecki. Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. InCVPR, 2023. 2

  38. [38]

    Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024

    Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J Tarr, and Katerina Fragkiadaki. Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024. 2

  39. [39]

    Motionlm: Multi-agent motion forecasting as language modeling

    Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. Motionlm: Multi-agent motion forecasting as language modeling. InICCV, 2023. 1, 2, 5

  40. [40]

    Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022

    Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022. 1, 2, 3, 4, 5

  41. [41]

    Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2, 5

  42. [42]

    M2i: From factored marginal trajectory pre- diction to interactive prediction

    Qiao Sun, Xin Huang, Junru Gu, Brian C Williams, and Hang Zhao. M2i: From factored marginal trajectory pre- diction to interactive prediction. InCVPR, 2022. 2

  43. [43]

    Salmon: Self-alignment with instructable re- ward models

    Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Daniel Cox, Yiming Yang, and Chuang Gan. Salmon: Self-alignment with instructable re- ward models. InICLR, 2024. 2

  44. [44]

    Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers

    Omer Sahin Tas and Royden Wagner. Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers. InICLR, 2025. 2

  45. [45]

    Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023

    ¨Omer S ¸ahin Tas ¸, Philipp Heinrich Brusius, and Christoph Stiller. Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023. 1, 2

  46. [46]

    Attention is all you need.NeurIPS, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 2017. 3

  47. [47]

    PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026

    Royden Wagner.Interpretable Representation Learning for Motion Forecasting. PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026. 8

  48. [48]

    Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024

    Royden Wagner, Omer Sahin Tas, Marvin Klemp, Carlos Fernandez, and Christoph Stiller. Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024. 3, 5

  49. [49]

    Jointmotion: Joint self-supervision for joint motion prediction

    Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Car- los Fernandez Lopez. Jointmotion: Joint self-supervision for joint motion prediction. InCoRL, 2024. 2, 5

  50. [50]

    SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

    Royden Wagner, ¨Omer S ¸ahin Tas ¸, Marlon Steiner, Fabian Konstantinidis, Hendrik Konigshof, Marvin Klemp, Car- los Fernandez, and Christoph Stiller. SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts. In IEEE International Conference on Intelligent Transportation Systems (ITSC), 2024. 5

  51. [51]

    Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

    Benjamin Warner, Antoine Chaffin, Benjamin Clavi ´e, Orion Weller, Oskar Hallstr ¨om, Said Taghadouini, Alexis Gal- lagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference.arXiv preprint arXiv:2412.13663, 2024. 2

  52. [52]

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 4, 5, 6

  53. [53]

    V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting

    Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, et al. V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting. InCVPR, 2023. 4, 5

  54. [54]

    Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding

    Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, and Luc V Gool. Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding. NeurIPS, 2024. 1, 2, 3, 4

  55. [55]

    Query-centric trajectory prediction

    Zikang Zhou, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Query-centric trajectory prediction. InCVPR, 2023. 1, 2, 3

  56. [56]

    Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023

    Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023. 1, 2, 5

  57. [57]

    Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012

    Hans-Georg Zimmermann, Christoph Tietz, and Ralph Grothmann. Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012. 2