RetroMotion: Retrocausal Motion Forecasting Models are Instructable

Abhishek Vivekanandan; Carlos Fernandez; Christoph Stiller; Dominik Strutz; Felix Hauser; Jaime Villa; Marlon Steiner; Omer Sahin Tas; Royden Wagner; Yinzhe Shen

arxiv: 2505.20414 · v2 · submitted 2025-05-26 · 💻 cs.CV · cs.AI· cs.RO

RetroMotion: Retrocausal Motion Forecasting Models are Instructable

Royden Wagner , Omer Sahin Tas , Felix Hauser , Marlon Steiner , Dominik Strutz , Abhishek Vivekanandan , Jaime Villa , Yinzhe Shen

show 2 more authors

Carlos Fernandez Christoph Stiller

This is my paper

Pith reviewed 2026-05-19 12:39 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.RO

keywords motion forecastingmulti-agent predictionretrocausal modelingtransformerinstruction followingjoint trajectory distributiontrajectory predictionautonomous driving

0 comments

The pith

Transformer motion models generate joint agent trajectories via retrocausal re-encoding of marginals and implicitly follow user instructions after standard training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-agent motion forecasts can be decomposed into marginal distributions for individual agents and joint distributions for interacting ones. A transformer generates the joints by re-encoding the marginal trajectories and then applying pairwise modeling, which creates a retrocausal flow passing information from later marginal points back to earlier joint points. Positional uncertainty at each step is captured with compressed exponential power distributions. If this holds, forecasting systems for road users would produce accurate interacting trajectories while also accepting and adapting to high-level instructions without any special instruction-tuning stage. A sympathetic reader cares because this turns a passive predictor into one that can be guided on the fly in complex scenes.

Core claim

Using a transformer model, joint distributions are generated by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, positional uncertainty is modeled using compressed exponential power distributions. The resulting models achieve strong results on the Waymo Interaction Prediction Challenge, generalize to Argoverse 2 and V2X-Seq, and follow instructions that adapt to scene context after ordinary motion-forecasting training.

What carries the argument

Retrocausal flow created by re-encoding marginal trajectory distributions then performing pairwise joint modeling inside the transformer.

Load-bearing premise

Re-encoding marginal distributions and performing only pairwise modeling is sufficient to capture necessary multi-agent interactions for both accurate joints and instruction following.

What would settle it

A controlled experiment in which the model receives an explicit instruction to alter behavior yet produces joint trajectories that ignore the instruction or violate scene constraints would show the implicit instructability claim is false.

Figures

Figures reproduced from arXiv: 2505.20414 by Abhishek Vivekanandan, Carlos Fernandez, Christoph Stiller, Dominik Strutz, Felix Hauser, Jaime Villa, Marlon Steiner, Omer Sahin Tas, Royden Wagner, Yinzhe Shen.

**Figure 1.** Figure 1: From marginal to joint trajectory distributions. Left part: We use an MLP to generate query matrices Q from marginal trajectories and exchange information between queries and scene context. Scene context representations are learned by our scene encoder (see Section 3.5). Right part: Afterwards, we decode joint trajectories P joint 1:T from pairs of queries at the same index for both agents ((1, 1),(2, 2), … view at source ↗

**Figure 2.** Figure 2: Joint and marginal motion forecasts of our model. Dynamic agents are shown in blue, static agents in grey (determined at t = 0 s). Lanes are black lines, road markings are white lines, and traffic light states are shown as colored spheres. (a) and (b): Top1 mode of joint motion forecasts on the Waymo Open Motion and Argoverse 2 datasets. (c) Marginal forecasts on the V2X-Seq dataset. (d) 6 modes of a margi… view at source ↗

**Figure 3.** Figure 3: Adapting a basic turn left instruction to the scene context. The upper plot shows the default marginal trajectory forecast of our model. The middle plot shows our basic turn left instructions, which violate traffic rules by turning into the oncoming lanes. The lower plot shows that our model responds to this instruction by adapting the trajectory of the right vehicle to its lane (shown as black line) an… view at source ↗

**Figure 4.** Figure 4: Mixture weight of normal components in exponential power distributions (see Equation (2)). The weight w progressively increases, reaching higher values for joint trajectory distributions than for marginal ones. However, w remains below 0.15, indicating that the learned distributions are Laplace-like. Initially, the weight is close to 0, because the negative log-likelihood (NLL) of a normal distribution i… view at source ↗

**Figure 5.** Figure 5: Neural regression collapse for motion forecasting. We measure the NRC1 metric for feature vectors of marginal and joint trajectory distributions. There is an immediate collapse in the upper plot, but none in the lower plot. Therefore, the true dimensionality lies between 32 and 272. This suggests that other density parameters besides the 32 location parameters are important. 5. Conclusion In this work, we… view at source ↗

read the original abstract

Motion forecasts of road users (i.e., agents) vary in complexity depending on the number of agents, scene constraints, and interactions. In particular, the output space of joint trajectory distributions grows exponentially with the number of agents. Therefore, we decompose multi-agent motion forecasts into (1) marginal distributions for all modeled agents and (2) joint distributions for interacting agents. Using a transformer model, we generate joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, we model the positional uncertainty using compressed exponential power distributions. Notably, our method achieves strong results in the Waymo Interaction Prediction Challenge and generalizes well to the Argoverse 2 and V2X-Seq datasets. Additionally, our method provides an interface for issuing instructions. We show that standard motion forecasting training implicitly enables the model to follow instructions and adapt them to the scene context. GitHub repository: https://github.com/kit-mrt/future-motion

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Retrocausal re-encoding of marginals into pairwise joints gives competitive Waymo numbers and implicit instructability, but the pairwise limit is the real question mark for larger groups.

read the letter

The bottom line is that this paper gets joint distributions by re-encoding marginal trajectories with a retrocausal transformer pass and then doing pairwise modeling, and it turns out standard training already lets the model take instructions and adapt them to the scene. That combination is what stands out. The retrocausal flow moves information from later marginal timesteps back into earlier joint timesteps, which is a concrete way to inject future context without changing the overall decomposition. They also pick compressed exponential power distributions for the per-timestep positional uncertainty, and they report solid numbers on the Waymo Interaction Prediction Challenge plus reasonable transfer to Argoverse 2 and V2X-Seq. The public repo is there, so the implementation can be checked directly. The implicit instructability result is useful if it holds, because it suggests controllability comes for free rather than needing extra objectives or fine-tuning. That part feels like a practical win for anyone who wants to steer forecasts without retraining from scratch. The soft spot is exactly the one the stress test flags. Once you move past pairs, the joint over three or more agents can have non-factorizable dependencies that pairwise terms after marginal re-encoding may not capture. The paper does not appear to spell out how larger simultaneous interactions are handled or whether they ran targeted checks against that failure mode. Without those details or error bars on the main tables, it is hard to know how much the reported gains depend on the specific scenes in the benchmarks. The citation pattern looks standard for the field and the work is benchmark-driven rather than circular, which is fine. This is aimed at people building multi-agent predictors for driving stacks who care about both accuracy and some level of controllability. A reader already working on transformer-based forecasters or on adding interfaces to these models will find the retrocausal trick and the instruction result worth looking at. The empirical grounding and the repo are enough to justify sending it to referees instead of desk-rejecting it, even if the higher-order interaction question needs to be tightened in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces RetroMotion, a transformer-based approach to multi-agent motion forecasting that decomposes the task into per-agent marginal trajectory distributions and pairwise joint distributions for interacting agents. Joints are produced by re-encoding the marginals and applying pairwise modeling that injects retrocausal information from later marginal timesteps into earlier joint timesteps. Positional uncertainty at each timestep is represented by compressed exponential power distributions. The method reports competitive performance on the Waymo Interaction Prediction Challenge, cross-dataset generalization to Argoverse 2 and V2X-Seq, and an emergent ability to follow natural-language instructions after standard training.

Significance. If the reported empirical gains prove robust, the decomposition plus retrocausal re-encoding offers a practical route to scaling joint forecasting without enumerating the full exponential joint space. The public repository aids reproducibility. The instructability result, if confirmed, would be a useful side-benefit of standard training regimes. Significance is currently limited by the absence of error bars, detailed ablations on interaction order, and explicit baseline tables in the abstract.

major comments (2)

[Method / Joint distribution construction] The central modeling choice—re-encoding marginals followed by pairwise joint modeling—must be shown to capture higher-order (3+-agent) interactions that cannot be factored into pairs. The abstract and method description give no explicit validation or ablation on scenes containing simultaneous three-or-more-agent constraints; if such groups are simply ignored or approximated, the joint-distribution claim is load-bearing and requires supporting experiments or theoretical justification.
[Experiments / Waymo results] Abstract and experimental claims rest on “strong results” and “good generalization” without reported error bars, standard-deviation across seeds, or side-by-side numerical tables against published baselines. This prevents assessment of whether the retrocausal component or the distributional choice actually drives the gains.

minor comments (2)

[Method / Uncertainty modeling] Define the precise parameterization and fitting procedure for the compressed exponential power distributions; the current description leaves the number of free parameters and any scene-dependent conditioning unclear.
[Experiments] Add a short table or paragraph listing the exact baseline methods and their scores on the same Waymo Interaction Prediction Challenge split used for the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying our approach and outlining revisions that will strengthen the manuscript.

read point-by-point responses

Referee: [Method / Joint distribution construction] The central modeling choice—re-encoding marginals followed by pairwise joint modeling—must be shown to capture higher-order (3+-agent) interactions that cannot be factored into pairs. The abstract and method description give no explicit validation or ablation on scenes containing simultaneous three-or-more-agent constraints; if such groups are simply ignored or approximated, the joint-distribution claim is load-bearing and requires supporting experiments or theoretical justification.

Authors: We agree that higher-order interactions represent an important consideration. Our decomposition into marginals and pairwise joints is explicitly presented as a scalable approximation to the full joint distribution, which grows exponentially with agent count. The retrocausal re-encoding is intended to allow interaction information to propagate across timesteps even within this pairwise structure. To directly address the concern, the revised manuscript will include a new ablation evaluating performance on data subsets containing three or more simultaneously interacting agents, together with a discussion of the approximation's empirical behavior and theoretical motivation. revision: yes
Referee: [Experiments / Waymo results] Abstract and experimental claims rest on “strong results” and “good generalization” without reported error bars, standard-deviation across seeds, or side-by-side numerical tables against published baselines. This prevents assessment of whether the retrocausal component or the distributional choice actually drives the gains.

Authors: We acknowledge that the current presentation would benefit from greater statistical detail. In the revision we will report standard deviations computed across multiple random seeds for the primary Waymo metrics and will add an explicit side-by-side numerical table in both the abstract and results section that directly compares our method against the published baseline numbers using the official challenge metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling choices validated on external benchmarks

full rationale

The paper presents an architectural construction for multi-agent motion forecasting: decomposing forecasts into per-agent marginal distributions and selected pairwise joint distributions, then generating the joints via re-encoding of marginal trajectories followed by pairwise transformer modeling that injects retrocausal information. Performance is reported on external public challenges and datasets (Waymo Interaction Prediction Challenge, Argoverse 2, V2X-Seq) with no equations shown that reduce the claimed joint distributions or benchmark scores to quantities defined solely by the model's own fitted parameters. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text; the approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that pairwise joints after marginal re-encoding suffice for interaction modeling and on standard transformer training dynamics for the instruction property; no new physical entities are postulated.

free parameters (1)

parameters of compressed exponential power distributions
Used per time step to model positional uncertainty; specific values or fitting procedure not stated in abstract.

axioms (1)

domain assumption Decomposition of joint trajectory distributions into marginals plus pairwise interactions is sufficient to represent multi-agent scene dynamics
Invoked when the paper states it decomposes forecasts into marginal distributions for all agents and joint distributions for interacting agents.

pith-pipeline@v0.9.0 · 5750 in / 1359 out tokens · 39655 ms · 2026-05-19T12:39:08.145165+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
cs.RO 2026-05 unverdicted novelty 5.0

CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
cs.RO 2026-05 unverdicted novelty 5.0

CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

The prevalence of neural collapse in neural multivariate regression

George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, and Keith Ross. The prevalence of neural collapse in neural multivariate regression. InNeurIPS, 2025. 4, 8

work page 2025
[2]

Forecasting sequential data using con- sistent koopman autoencoders

Omri Azencot, N Benjamin Erichson, Vanessa Lin, and Michael Mahoney. Forecasting sequential data using con- sistent koopman autoencoders. InICML, 2020. 2

work page 2020
[3]

Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting

Inhwan Bae, Jean Oh, and Hae-Gon Jeon. Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting. InICCV, 2023. 2

work page 2023
[4]

Mixture density networks

Christopher M Bishop. Mixture density networks. 1994. 3

work page 1994
[5]

Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

Mohamed-Khalil Bouzidi, Bojan Derajic, Daniel Goehring, and Joerg Reichardt. Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

work page arXiv
[6]

Implicit latent variable model for scene-consistent motion forecasting

Sergio Casas, Cole Gulino, Simon Suo, Katie Luo, Renjie Liao, and Raquel Urtasun. Implicit latent variable model for scene-consistent motion forecasting. InECCV, 2020. 2

work page 2020
[7]

Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction

Yuning Chai, Benjamin Sapp, Mayank Bansal, and Dragomir Anguelov. Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction. InCoRL, 2020. 3

work page 2020
[8]

History aware multimodal transformer for vision-and-language navigation

Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, and Ivan Laptev. History aware multimodal transformer for vision-and-language navigation. InNeurIPS, 2021. 2

work page 2021
[9]

Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders

Jie Cheng, Xiaodong Mei, and Ming Liu. Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. InICCV, 2023. 2

work page 2023
[10]

Gorela: Go relative for viewpoint-invariant motion forecasting

Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. InICRA, 2023. 1

work page 2023
[11]

Bert: Pre-training of deep bidirectional trans- formers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. InNAACL, 2019. 2

work page 2019
[12]

Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset. InICCV, 2021. 4, 5, 6

work page 2021
[13]

A review of sparse expert models in deep learning

William Fedus, Jeff Dean, and Barret Zoph. A review of sparse expert models in deep learning.arXiv preprint arXiv:2209.01667, 2022. 4

work page arXiv 2022
[14]

Unitraj: A unified framework for scalable vehicle trajectory prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehicle trajectory prediction. InECCV, 2024. 5

work page 2024
[15]

Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation

Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation. InCVPR, 2020. 3

work page 2020
[16]

An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023

Maximilian Geisslinger, Franziska Poszler, and Markus Lienkamp. An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023. 1, 2

work page 2023
[17]

Thomas: Trajectory heatmap output with learned multi-agent sampling

Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bog- dan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. InICLR,

work page
[18]

Latent variable sequential set transformers for joint multi-agent motion prediction,

Roger Girgis, Florian Golemo, Felipe Codevilla, Martin Weiss, Jim Aldon D’Souza, Samira Ebrahimi Kahou, Felix Heide, and Christopher Pal. Latent variable sequential set transformers for joint multi-agent motion prediction.arXiv preprint arXiv:2104.00563, 2021. 2, 5

work page arXiv 2021
[19]

Instruction-driven history-aware policies for robotic manip- ulations

Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, Ivan Laptev, and Cordelia Schmid. Instruction-driven history-aware policies for robotic manip- ulations. InCoRL, 2023. 2

work page 2023
[20]

Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012

Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012. 1

work page 2012
[21]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InICCV, 2023. 5

work page 2023
[22]

Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction

Ronny Hug, Wolfgang H ¨ubner, and Michael Arens. Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction. InAAAI, 2020. 2

work page 2020
[23]

Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 4

work page 1991
[24]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion

Chiyu Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, Dragomir Anguelov, et al. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. InCVPR, 2023. 1, 2, 4, 5

work page 2023
[25]

Openvla: An open-source vision-language-action model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. Openvla: An open-source vision-language-action model. InCoRL, 2024. 2

work page 2024
[26]

Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022

Stepan Konev. Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022. 4

work page arXiv 2022
[27]

SEPT: Towards efficient scene represen- tation learning for motion prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, and Shengbo Eben Li. SEPT: Towards efficient scene represen- tation learning for motion prediction. InICLR, 2024. 2

work page 2024
[28]

Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016

Stefan Lee, Senthil Purushwalkam Shiva Prakash, Michael Cogswell, Viresh Ranjan, David Crandall, and Dhruv Batra. Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016. 1

work page 2016
[29]

Reasoning multi-agent behavioral topology for interac- tive autonomous driving

Haochen Liu, Li Chen, Yu Qiao, Chen Lv, and Hongyang Li. Reasoning multi-agent behavioral topology for interac- tive autonomous driving. InNeurIPS, 2024. 5

work page 2024
[30]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. InICLR, 2019. 4

work page 2019
[31]

Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving

Wenjie Luo, Cheol Park, Andre Cornman, Benjamin Sapp, and Dragomir Anguelov. Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving. In Conference on Robot Learning, 2023. 2, 5

work page 2023
[32]

Learning trajectory dependencies for human motion pre- diction

Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion pre- diction. InICCV, 2019. 2

work page 2019
[33]

Wayformer: Motion forecasting via simple & efficient attention networks

Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. InIEEE International Conference on Robotics and Automa- tion (ICRA), 2023. 1, 2, 3, 5

work page 2023
[34]

Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents

Jiquan Ngiam, Vijay Vasudevan, Benjamin Caine, Zheng- dong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, et al. Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents. InICLR, 2022. 1, 2, 5, 6

work page 2022
[35]

Episodic transformer for vision-and-language navigation

Alexander Pashevich, Cordelia Schmid, and Chen Sun. Episodic transformer for vision-and-language navigation. In ICCV, 2021. 2

work page 2021
[36]

Scaling instructable agents across many simulated worlds

Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, et al. Scal- ing instructable agents across many simulated worlds.arXiv preprint arXiv:2404.10179, 2024. 2

work page arXiv 2024
[37]

Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs

Luke Rowe, Martin Ethier, Eli-Henry Dykhne, and Krzysztof Czarnecki. Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. InCVPR, 2023. 2

work page 2023
[38]

Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024

Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J Tarr, and Katerina Fragkiadaki. Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024. 2

work page arXiv 2024
[39]

Motionlm: Multi-agent motion forecasting as language modeling

Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. Motionlm: Multi-agent motion forecasting as language modeling. InICCV, 2023. 1, 2, 5

work page 2023
[40]

Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022. 1, 2, 3, 4, 5

work page 2022
[41]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2, 5

work page 2024
[42]

M2i: From factored marginal trajectory pre- diction to interactive prediction

Qiao Sun, Xin Huang, Junru Gu, Brian C Williams, and Hang Zhao. M2i: From factored marginal trajectory pre- diction to interactive prediction. InCVPR, 2022. 2

work page 2022
[43]

Salmon: Self-alignment with instructable re- ward models

Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Daniel Cox, Yiming Yang, and Chuang Gan. Salmon: Self-alignment with instructable re- ward models. InICLR, 2024. 2

work page 2024
[44]

Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers

Omer Sahin Tas and Royden Wagner. Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers. InICLR, 2025. 2

work page 2025
[45]

Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023

¨Omer S ¸ahin Tas ¸, Philipp Heinrich Brusius, and Christoph Stiller. Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023. 1, 2

work page arXiv 2023
[46]

Attention is all you need.NeurIPS, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 2017. 3

work page 2017
[47]

PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026

Royden Wagner.Interpretable Representation Learning for Motion Forecasting. PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026. 8

work page 2026
[48]

Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024

Royden Wagner, Omer Sahin Tas, Marvin Klemp, Carlos Fernandez, and Christoph Stiller. Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024. 3, 5

work page 2024
[49]

Jointmotion: Joint self-supervision for joint motion prediction

Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Car- los Fernandez Lopez. Jointmotion: Joint self-supervision for joint motion prediction. InCoRL, 2024. 2, 5

work page 2024
[50]

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

Royden Wagner, ¨Omer S ¸ahin Tas ¸, Marlon Steiner, Fabian Konstantinidis, Hendrik Konigshof, Marvin Klemp, Car- los Fernandez, and Christoph Stiller. SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts. In IEEE International Conference on Intelligent Transportation Systems (ITSC), 2024. 5

work page 2024
[51]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavi ´e, Orion Weller, Oskar Hallstr ¨om, Said Taghadouini, Alexis Gal- lagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference.arXiv preprint arXiv:2412.13663, 2024. 2

work page internal anchor Pith review arXiv 2024
[52]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 4, 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting

Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, et al. V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting. InCVPR, 2023. 4, 5

work page 2023
[54]

Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding

Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, and Luc V Gool. Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding. NeurIPS, 2024. 1, 2, 3, 4

work page 2024
[55]

Query-centric trajectory prediction

Zikang Zhou, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Query-centric trajectory prediction. InCVPR, 2023. 1, 2, 3

work page 2023
[56]

Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023

Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023. 1, 2, 5

work page arXiv 2023
[57]

Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012

Hans-Georg Zimmermann, Christoph Tietz, and Ralph Grothmann. Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012. 2

work page 2012

[1] [1]

The prevalence of neural collapse in neural multivariate regression

George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, and Keith Ross. The prevalence of neural collapse in neural multivariate regression. InNeurIPS, 2025. 4, 8

work page 2025

[2] [2]

Forecasting sequential data using con- sistent koopman autoencoders

Omri Azencot, N Benjamin Erichson, Vanessa Lin, and Michael Mahoney. Forecasting sequential data using con- sistent koopman autoencoders. InICML, 2020. 2

work page 2020

[3] [3]

Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting

Inhwan Bae, Jean Oh, and Hae-Gon Jeon. Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting. InICCV, 2023. 2

work page 2023

[4] [4]

Mixture density networks

Christopher M Bishop. Mixture density networks. 1994. 3

work page 1994

[5] [5]

Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

Mohamed-Khalil Bouzidi, Bojan Derajic, Daniel Goehring, and Joerg Reichardt. Motion planning under uncertainty: In- tegrating learning-based multi-modal predictors into branch model predictive control.arXiv preprint arXiv:2405.03470,

work page arXiv

[6] [6]

Implicit latent variable model for scene-consistent motion forecasting

Sergio Casas, Cole Gulino, Simon Suo, Katie Luo, Renjie Liao, and Raquel Urtasun. Implicit latent variable model for scene-consistent motion forecasting. InECCV, 2020. 2

work page 2020

[7] [7]

Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction

Yuning Chai, Benjamin Sapp, Mayank Bansal, and Dragomir Anguelov. Multipath: Multiple probabilistic anchor trajec- tory hypotheses for behavior prediction. InCoRL, 2020. 3

work page 2020

[8] [8]

History aware multimodal transformer for vision-and-language navigation

Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, and Ivan Laptev. History aware multimodal transformer for vision-and-language navigation. InNeurIPS, 2021. 2

work page 2021

[9] [9]

Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders

Jie Cheng, Xiaodong Mei, and Ming Liu. Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. InICCV, 2023. 2

work page 2023

[10] [10]

Gorela: Go relative for viewpoint-invariant motion forecasting

Alexander Cui, Sergio Casas, Kelvin Wong, Simon Suo, and Raquel Urtasun. Gorela: Go relative for viewpoint-invariant motion forecasting. InICRA, 2023. 1

work page 2023

[11] [11]

Bert: Pre-training of deep bidirectional trans- formers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. InNAACL, 2019. 2

work page 2019

[12] [12]

Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive mo- tion forecasting for autonomous driving: The waymo open motion dataset. InICCV, 2021. 4, 5, 6

work page 2021

[13] [13]

A review of sparse expert models in deep learning

William Fedus, Jeff Dean, and Barret Zoph. A review of sparse expert models in deep learning.arXiv preprint arXiv:2209.01667, 2022. 4

work page arXiv 2022

[14] [14]

Unitraj: A unified framework for scalable vehicle trajectory prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehicle trajectory prediction. InECCV, 2024. 5

work page 2024

[15] [15]

Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation

Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation. InCVPR, 2020. 3

work page 2020

[16] [16]

An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023

Maximilian Geisslinger, Franziska Poszler, and Markus Lienkamp. An ethical trajectory planning algorithm for au- tonomous vehicles.Nature Machine Intelligence, 2023. 1, 2

work page 2023

[17] [17]

Thomas: Trajectory heatmap output with learned multi-agent sampling

Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bog- dan Stanciulescu, and Fabien Moutarde. Thomas: Trajectory heatmap output with learned multi-agent sampling. InICLR,

work page

[18] [18]

Latent variable sequential set transformers for joint multi-agent motion prediction,

Roger Girgis, Florian Golemo, Felipe Codevilla, Martin Weiss, Jim Aldon D’Souza, Samira Ebrahimi Kahou, Felix Heide, and Christopher Pal. Latent variable sequential set transformers for joint multi-agent motion prediction.arXiv preprint arXiv:2104.00563, 2021. 2, 5

work page arXiv 2021

[19] [19]

Instruction-driven history-aware policies for robotic manip- ulations

Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, Ivan Laptev, and Cordelia Schmid. Instruction-driven history-aware policies for robotic manip- ulations. InCoRL, 2023. 2

work page 2023

[20] [20]

Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012

Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs.NeurIPS, 2012. 1

work page 2012

[21] [21]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InICCV, 2023. 5

work page 2023

[22] [22]

Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction

Ronny Hug, Wolfgang H ¨ubner, and Michael Arens. Intro- ducing probabilistic b ´ezier curves for n-step sequence pre- diction. InAAAI, 2020. 2

work page 2020

[23] [23]

Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts.Neu- ral computation, 3(1):79–87, 1991. 4

work page 1991

[24] [24]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion

Chiyu Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, Dragomir Anguelov, et al. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. InCVPR, 2023. 1, 2, 4, 5

work page 2023

[25] [25]

Openvla: An open-source vision-language-action model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. Openvla: An open-source vision-language-action model. InCoRL, 2024. 2

work page 2024

[26] [26]

Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022

Stepan Konev. Mpa: Multipath++ based architecture for mo- tion prediction.arXiv preprint arXiv:2206.10041, 2022. 4

work page arXiv 2022

[27] [27]

SEPT: Towards efficient scene represen- tation learning for motion prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, and Shengbo Eben Li. SEPT: Towards efficient scene represen- tation learning for motion prediction. InICLR, 2024. 2

work page 2024

[28] [28]

Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016

Stefan Lee, Senthil Purushwalkam Shiva Prakash, Michael Cogswell, Viresh Ranjan, David Crandall, and Dhruv Batra. Stochastic multiple choice learning for training diverse deep ensembles.NeurIPS, 2016. 1

work page 2016

[29] [29]

Reasoning multi-agent behavioral topology for interac- tive autonomous driving

Haochen Liu, Li Chen, Yu Qiao, Chen Lv, and Hongyang Li. Reasoning multi-agent behavioral topology for interac- tive autonomous driving. InNeurIPS, 2024. 5

work page 2024

[30] [30]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. InICLR, 2019. 4

work page 2019

[31] [31]

Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving

Wenjie Luo, Cheol Park, Andre Cornman, Benjamin Sapp, and Dragomir Anguelov. Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving. In Conference on Robot Learning, 2023. 2, 5

work page 2023

[32] [32]

Learning trajectory dependencies for human motion pre- diction

Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion pre- diction. InICCV, 2019. 2

work page 2019

[33] [33]

Wayformer: Motion forecasting via simple & efficient attention networks

Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. InIEEE International Conference on Robotics and Automa- tion (ICRA), 2023. 1, 2, 3, 5

work page 2023

[34] [34]

Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents

Jiquan Ngiam, Vijay Vasudevan, Benjamin Caine, Zheng- dong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, et al. Scene transformer: A unified architecture for predicting fu- ture trajectories of multiple agents. InICLR, 2022. 1, 2, 5, 6

work page 2022

[35] [35]

Episodic transformer for vision-and-language navigation

Alexander Pashevich, Cordelia Schmid, and Chen Sun. Episodic transformer for vision-and-language navigation. In ICCV, 2021. 2

work page 2021

[36] [36]

Scaling instructable agents across many simulated worlds

Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, et al. Scal- ing instructable agents across many simulated worlds.arXiv preprint arXiv:2404.10179, 2024. 2

work page arXiv 2024

[37] [37]

Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs

Luke Rowe, Martin Ethier, Eli-Henry Dykhne, and Krzysztof Czarnecki. Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. InCVPR, 2023. 2

work page 2023

[38] [38]

Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024

Gabriel Sarch, Sahil Somani, Raghav Kapoor, Michael J Tarr, and Katerina Fragkiadaki. Helper-x: A unified in- structable embodied agent to tackle four interactive vision- language domains with memory-augmented language mod- els.arXiv preprint arXiv:2404.19065, 2024. 2

work page arXiv 2024

[39] [39]

Motionlm: Multi-agent motion forecasting as language modeling

Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. Motionlm: Multi-agent motion forecasting as language modeling. InICCV, 2023. 1, 2, 5

work page 2023

[40] [40]

Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Motion transformer with global intention localization and lo- cal movement refinement.NeurIPS, 2022. 1, 2, 3, 4, 5

work page 2022

[41] [41]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2, 5

work page 2024

[42] [42]

M2i: From factored marginal trajectory pre- diction to interactive prediction

Qiao Sun, Xin Huang, Junru Gu, Brian C Williams, and Hang Zhao. M2i: From factored marginal trajectory pre- diction to interactive prediction. InCVPR, 2022. 2

work page 2022

[43] [43]

Salmon: Self-alignment with instructable re- ward models

Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Daniel Cox, Yiming Yang, and Chuang Gan. Salmon: Self-alignment with instructable re- ward models. InICLR, 2024. 2

work page 2024

[44] [44]

Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers

Omer Sahin Tas and Royden Wagner. Words in Motion: Ex- tracting Interpretable Control Vectors for Motion Transform- ers. InICLR, 2025. 2

work page 2025

[45] [45]

Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023

¨Omer S ¸ahin Tas ¸, Philipp Heinrich Brusius, and Christoph Stiller. Decision-theoretic mpc: Motion planning with weighted maneuver preferences under uncertainty.arXiv preprint arXiv:2310.17963, 2023. 1, 2

work page arXiv 2023

[46] [46]

Attention is all you need.NeurIPS, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 2017. 3

work page 2017

[47] [47]

PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026

Royden Wagner.Interpretable Representation Learning for Motion Forecasting. PhD thesis, Karlsruher Institut f¨ur Tech- nologie (KIT), 2026. 8

work page 2026

[48] [48]

Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024

Royden Wagner, Omer Sahin Tas, Marvin Klemp, Carlos Fernandez, and Christoph Stiller. Redmotion: Motion pre- diction via redundancy reduction.Transactions on Machine Learning Research, 2024. 3, 5

work page 2024

[49] [49]

Jointmotion: Joint self-supervision for joint motion prediction

Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Car- los Fernandez Lopez. Jointmotion: Joint self-supervision for joint motion prediction. InCoRL, 2024. 2, 5

work page 2024

[50] [50]

SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts

Royden Wagner, ¨Omer S ¸ahin Tas ¸, Marlon Steiner, Fabian Konstantinidis, Hendrik Konigshof, Marvin Klemp, Car- los Fernandez, and Christoph Stiller. SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts. In IEEE International Conference on Intelligent Transportation Systems (ITSC), 2024. 5

work page 2024

[51] [51]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavi ´e, Orion Weller, Oskar Hallstr ¨om, Said Taghadouini, Alexis Gal- lagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference.arXiv preprint arXiv:2412.13663, 2024. 2

work page internal anchor Pith review arXiv 2024

[52] [52]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 4, 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting

Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, et al. V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and fore- casting. InCVPR, 2023. 4, 5

work page 2023

[54] [54]

Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding

Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, and Luc V Gool. Real-time motion prediction via het- erogeneous polyline transformer with relative pose encoding. NeurIPS, 2024. 1, 2, 3, 4

work page 2024

[55] [55]

Query-centric trajectory prediction

Zikang Zhou, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Query-centric trajectory prediction. InCVPR, 2023. 1, 2, 3

work page 2023

[56] [56]

Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023

Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023. 1, 2, 5

work page arXiv 2023

[57] [57]

Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012

Hans-Georg Zimmermann, Christoph Tietz, and Ralph Grothmann. Forecasting with recurrent neural networks: 12 tricks.Neural Networks: Tricks of the Trade: Second Edi- tion, pages 687–707, 2012. 2

work page 2012