arxiv: 2605.10717 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.CV

Recognition: no theorem link

Heteroscedastic Diffusion for Multi-Agent Trajectory Modeling

Guillem Capellera , Antonio Rubio , Luis Ferraz , Antonio Agudo

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords heteroscedastic diffusionmulti-agent trajectory modelingtrajectory completionuncertainty estimationdiffusion modelstrajectory forecastingsports analytics

0 comments

The pith

A diffusion model unifies trajectory completion and forecasting while estimating state-specific uncertainty and ranking predictions by error likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work seeks to extend diffusion models beyond pure forecasting to also handle trajectory completion, a task needed when tracking data has gaps. It adds state-wise heteroscedastic uncertainty by changing the training loss and moving uncertainty from the model's internal space to actual positions via a linear approximation. A separate ranking network then assigns error probabilities to each generated scene so users can pick the most reliable one. This matters for applications like sports analytics or robotics where incomplete paths and unreliable predictions are common. The resulting method is shown to beat prior approaches on completion and forecasting tasks across several multi-agent sports datasets.

Core claim

U2Diffine augments the standard denoising objective with the negative log-likelihood of the predicted noise to learn heteroscedastic uncertainty, propagates that uncertainty to state space with a first-order Taylor expansion, and pairs the model with a RankNN that predicts per-mode error probabilities, enabling both completion and forecasting with uncertainty awareness.

What carries the argument

The augmented denoising loss combined with first-order Taylor propagation of latent uncertainty, plus the RankNN for mode ranking.

If this is right

Trajectory completion becomes feasible in the same framework as forecasting, directly addressing incomplete tracking data.
State-wise uncertainty estimates become available for every agent position and velocity without extra sampling.
Generated scenes can be ranked at inference time by their predicted error probability instead of relying on likelihood alone.
A faster sampling variant achieves the same speed as ordinary generative diffusion while retaining the uncertainty capability.
Performance gains appear consistently on four distinct sports datasets covering basketball, football, and soccer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uncertainty propagation step could be tested on non-sports multi-agent settings such as pedestrian crowds or vehicle fleets to check if the linear approximation still holds.
Error-probability ranking might be combined with downstream decision modules that choose actions only when uncertainty falls below a threshold.
If the ranking network generalizes, it could be attached to other generative models to turn raw samples into ordered, confidence-labeled outputs.

Load-bearing premise

The linear approximation that converts uncertainty inside the model into uncertainty in actual positions and velocities remains accurate enough without large errors from ignored higher-order effects on these movement patterns.

What would settle it

If the reported uncertainty values show low or negative correlation with actual squared errors on new multi-agent sequences, or if recomputing uncertainties with second-order terms in the expansion produces markedly different results.

Figures

Figures reproduced from arXiv: 2605.10717 by Antonio Agudo, Antonio Rubio, Guillem Capellera, Luis Ferraz.

**Figure 1.** Figure 1: Uncertainty-aware, unified and interpretable approach for trajectory modeling in multi-agent scenarios. U2Diffine/U2Diff is a diffusion-based model capable of performing trajectory completion tasks such as forecasting, imputation or inferring totally unseen agents, while also jointly estimating state-wise uncertainty. RankNN is a post-processing operation that infers an error probability for each generated… view at source ↗

**Figure 2.** Figure 2: Reverse Gaussian Sampling. Accuracy Rate (AccRate) and Negative Log-Likelihood (NLL) results under different configurations of the Reverse Gaussian Sampling procedure. Specifically, the full Jacobian can be ill-conditioned or contain large off-diagonal elements, introducing numerical instabilities that result in non-positive-definite covariance estimates. To ensure robust and positive-definite covariance e… view at source ↗

**Figure 3.** Figure 3: U2Diffine architecture. The input consists of noisy trajectories (Xs), observations (Xco), a mask (M), and the current denoising step (s). These inputs are embedded and processed through two Residual Denoising Blocks. Within these blocks, a Social-temporal Block combines a Temporal Mamba and a Social Transformer to effectively model complex temporal and social dynamics. Finally the skip connection outputs,… view at source ↗

**Figure 4.** Figure 4: RankNN architecture. The model’s input for each generated scene consists of the mean (X0), the square roots of the covariance (Var(X 0)) eigenvalues, and the binary mask (M). These are embedded and processed through a Social-temporal Block. After that, it creates scene-level embeddings by averaging over time and agents. These embeddings are then fed into a Multi-scene Transformer to account for dependencie… view at source ↗

**Figure 5.** Figure 5: Qualitative comparisons in trajectory completion (top) and forecasting (bottom). Top: Comparison with UniTraj [35] on Football-U and BasketballU datasets, and Ours-20 is the corresponding distribution over 20 generated modes. Bottom: Comparison against AutoBots [7], LED [25] and MoFlow [51] on NBA trajectory forecasting. Ground truth player locations are shown in bright blue and pink, and the ball in gree… view at source ↗

**Figure 6.** Figure 6: Qualitative evaluation of the error correlation. Top: In orange, the AvgUcty versus SADE across the 20 generated modes using U2Diffine of a test scene example. In blue, the error probability e versus SADE. Bottom: Distribution of Spearman correlation coefficients ρ for all four test datasets, using AvgUcty in orange and RankNN predicting e in blue. TABLE III ABLATION STUDY ON U2DIFF/U2DIFFINE ARCHITECTURE … view at source ↗

read the original abstract

Multi-agent trajectory modeling traditionally focuses on forecasting, often neglecting more general tasks like trajectory completion, which is essential for real-world applications such as correcting tracking data. Existing methods also generally predict agents' states without offering any state-wise measure of heteroscedastic uncertainty. Moreover, popular multi-modal sampling methods lack error probability estimates for each generated scene under the same prior observations, which makes it difficult to rank the predictions at inference time. We introduce U2Diffine, a unified diffusion model built to perform trajectory completion while simultaneously offering state-wise heteroscedastic uncertainty estimates. This is achieved by augmenting the standard denoising loss with the negative log-likelihood of the predicted noise, and then propagating the latent space uncertainty to the real state space using a first-order Taylor approximation. We also propose U2Diff, a faster baseline that avoids gradient computation during sampling. This approach significantly increases inference speed, making it as efficient as a standard generative-only diffusion model. For post-processing, we integrate a Rank Neural Network (RankNN) that enables error probability estimation for each generated mode, demonstrating strong correlation with ground truth errors. Our method outperforms state-of-the-art solutions in both trajectory completion and forecasting across four challenging sports datasets (NBA, Basketball-U, Football-U, Soccer-U), underscoring the effectiveness of our uncertainty and error probability estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper packages NLL-augmented diffusion training, a first-order Taylor step for state-wise uncertainty, and a RankNN for mode ranking into one trajectory model, but the abstract supplies no numbers or checks on the approximation.

read the letter

The main contribution is U2Diffine, a diffusion model that handles both trajectory completion and forecasting for multiple agents while producing per-state heteroscedastic uncertainty. They augment the usual denoising objective with negative log-likelihood on the predicted noise, then apply a first-order Taylor expansion to push the latent variance out to real coordinates. A separate RankNN is added afterward to score the error probability of each generated mode. They report better results than prior work on four sports datasets and note that a faster U2Diff variant skips gradients at sampling time to keep inference cheap. The combination of these pieces for a unified completion-plus-uncertainty model looks new relative to the diffusion trajectory papers cited. The motivation for completion is practical, since real tracking data often needs repair, and having ranked, uncertainty-aware outputs could help downstream correction tasks. The paper does a clean job of stating the gaps in existing generative baselines. The soft spots are the missing quantitative support. No tables, ablations, or error bars appear in the abstract, so the outperformance claim cannot be assessed. The Taylor linearization is stated without derivation or any test of its accuracy on the non-linear velocity and interaction patterns in the sports data; if curvature terms matter, the uncertainty magnitudes and the RankNN scores could be biased, which would make the reported gains hard to attribute to the uncertainty modeling. The stress-test concern about this approximation therefore stands on the material provided. This work is aimed at people building generative models for motion or sports analytics who need uncertainty estimates. A reader could borrow the loss augmentation or the ranking network if the experiments later check out. I would bring it to a reading group to talk through the propagation step, but I would not cite it in its current form. It deserves peer review because the problem is well-posed and the architecture is coherent, even though the validation work still needs to be shown.

Referee Report

2 major / 2 minor

Summary. The manuscript presents U2Diffine, a unified diffusion model for multi-agent trajectory completion and forecasting that provides state-wise heteroscedastic uncertainty estimates. This is accomplished by augmenting the denoising objective with negative log-likelihood and propagating latent uncertainty via a first-order Taylor approximation. Additionally, U2Diff is proposed as a faster baseline, and a Rank Neural Network (RankNN) is used for estimating error probabilities of generated modes. The method is evaluated on four sports datasets (NBA, Basketball-U, Football-U, Soccer-U), claiming outperformance over state-of-the-art in both tasks.

Significance. Should the uncertainty estimates prove reliable and the performance gains hold under rigorous validation, this work could contribute meaningfully to multi-agent modeling by addressing the lack of uncertainty quantification in trajectory prediction. The error probability ranking adds practical value for selecting among multi-modal predictions.

major comments (2)

[Abstract] The central claim of outperforming SOTA on completion and forecasting rests on reliable state-wise heteroscedastic uncertainty, but the abstract reports outperformance without quantitative tables, ablation results, or error-bar details (see reader's take on soundness).
[Method (uncertainty propagation step)] The first-order Taylor approximation used to propagate latent-space uncertainty to real state space (described in the abstract) omits higher-order terms in the denoising network's Jacobian; in multi-agent sports data with velocities and interactions producing locally non-linear mappings, this can bias uncertainty magnitudes and downstream RankNN scores. Validation of approximation accuracy or comparison to Monte Carlo sampling on the target datasets is required to substantiate the gains.

minor comments (2)

[Abstract] The acronyms U2Diffine and U2Diff are introduced without clear expansion or distinction in the abstract.
[Experiments] Ensure consistent dataset naming and full citations (e.g., for NBA, Basketball-U) appear in the experimental section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and outline planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] The central claim of outperforming SOTA on completion and forecasting rests on reliable state-wise heteroscedastic uncertainty, but the abstract reports outperformance without quantitative tables, ablation results, or error-bar details (see reader's take on soundness).

Authors: The abstract is a concise summary of contributions and findings, as is standard; detailed quantitative tables, ablation studies on the uncertainty components, and error bars from multiple runs appear in the Experiments section of the full manuscript. The outperformance claims are substantiated by those results. We can revise the abstract to reference specific quantitative gains if space allows. revision: partial
Referee: [Method (uncertainty propagation step)] The first-order Taylor approximation used to propagate latent-space uncertainty to real state space (described in the abstract) omits higher-order terms in the denoising network's Jacobian; in multi-agent sports data with velocities and interactions producing locally non-linear mappings, this can bias uncertainty magnitudes and downstream RankNN scores. Validation of approximation accuracy or comparison to Monte Carlo sampling on the target datasets is required to substantiate the gains.

Authors: We acknowledge that the first-order Taylor approximation is an efficiency-driven choice that may not fully capture higher-order non-linear effects from agent interactions. In the revised manuscript we will add a direct comparison of the first-order approximation against Monte Carlo sampling on the NBA and Soccer-U datasets to quantify approximation error and confirm suitability for the reported uncertainty estimates and RankNN scores. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces U2Diffine by augmenting the denoising loss with NLL of predicted noise and propagating latent uncertainty via first-order Taylor approximation, plus a separate RankNN for mode ranking. These are presented as independent methodological additions rather than re-derivations of fitted quantities or self-referential definitions. Performance gains are reported as empirical results on external sports datasets (NBA, Basketball-U, etc.), with no load-bearing steps that reduce by construction to inputs, self-citations, or renamed known results. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond standard diffusion assumptions and the first-order Taylor step.

pith-pipeline@v0.9.0 · 5537 in / 1061 out tokens · 27302 ms · 2026-05-12T05:11:31.022356+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

[1]

Social LSTM: Human trajectory prediction in crowded spaces,

A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” inCVPR, 2016

work page 2016
[2]

Social GAN: Socially acceptable trajectories with generative adversarial net- works,

A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial net- works,” inCVPR, 2018

work page 2018
[3]

Social ways: Learning multi- modal distributions of pedestrian trajectories with gans,

J. Amirian, J.-B. Hayet, and J. Pettr ´e, “Social ways: Learning multi- modal distributions of pedestrian trajectories with gans,” inCVPRW, 2019

work page 2019
[4]

Social-bigat: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,

V . Kosaraju, A. Sadeghian, R. Mart ´ın-Mart´ın, I. Reid, H. Rezatofighi, and S. Savarese, “Social-bigat: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,” inNeurIPS, 2019

work page 2019
[5]

Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control,

T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control,” inECCV, 2020

work page 2020
[6]

Scene transformer: A unified architecture for predicting multiple agent trajectories,

J. Ngiam, B. Caine, V . Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopalet al., “Scene transformer: A unified architecture for predicting multiple agent trajectories,” in ICLR, 2022

work page 2022
[7]

Latent variable sequential set transformers for joint multi-agent motion prediction,

R. Girgis, F. Golemo, F. Codevilla, M. Weiss, J. A. D’Souza, S. E. Kahou, F. Heide, and C. Pal, “Latent variable sequential set transformers for joint multi-agent motion prediction,” inICLR, 2022

work page 2022
[8]

Social-patteRNN: Socially-aware trajectory prediction guided by motion patterns,

I. Navarro and J. Oh, “Social-patteRNN: Socially-aware trajectory prediction guided by motion patterns,” inIROS, 2022

work page 2022
[9]

Social- transmotion: Promptable human trajectory prediction,

S. Saadatnejad, Y . Gao, K. Messaoud, and A. Alahi, “Social- transmotion: Promptable human trajectory prediction,” inICLR, 2024

work page 2024
[10]

Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,

C. Xu, R. T. Tan, Y . Tan, S. Chen, Y . G. Wang, X. Wang, and Y . Wang, “Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,” inCVPR, 2023

work page 2023
[11]

Recurrent network models for human dynamics,

K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent network models for human dynamics,” inICCV, 2015

work page 2015
[12]

Structural-RNN: Deep learning on spatio-temporal graphs,

A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep learning on spatio-temporal graphs,” inCVPR, 2016

work page 2016
[13]

On human motion prediction using recurrent neural networks,

J. Martinez, M. J. Black, and J. Romero, “On human motion prediction using recurrent neural networks,” inCVPR, 2017

work page 2017
[14]

Learning trajectory depen- dencies for human motion prediction,

W. Mao, M. Liu, M. Salzmann, and H. Li, “Learning trajectory depen- dencies for human motion prediction,” inICCV, 2019

work page 2019
[15]

History repeats itself: Human motion prediction via motion attention,

W. Mao, M. Liu, and M. Salzmann, “History repeats itself: Human motion prediction via motion attention,” inECCV, 2020

work page 2020
[16]

A spatio-temporal transformer for 3D human motion prediction,

E. Aksan, M. Kaufmann, P. Cao, and O. Hilliges, “A spatio-temporal transformer for 3D human motion prediction,” in3DV, 2021

work page 2021
[17]

Learning progressive joint propagation for human motion prediction,

Y . Cai, L. Huang, Y . Wang, T.-J. Cham, J. Cai, J. Yuan, J. Liu, X. Yang, Y . Zhu, X. Shenet al., “Learning progressive joint propagation for human motion prediction,” inECCV, 2020

work page 2020
[18]

Back to MLP: A simple baseline for human motion prediction,

W. Guo, Y . Du, X. Shen, V . Lepetit, X. Alameda-Pineda, and F. Moreno- Noguer, “Back to MLP: A simple baseline for human motion prediction,” inWACV, 2023

work page 2023
[19]

Generating long-term trajectories using deep hierarchical networks,

S. Zheng, Y . Yue, and J. Hobbs, “Generating long-term trajectories using deep hierarchical networks,” inNeurIPS, 2016

work page 2016
[20]

Generating multi-agent trajectories using programmatic weak supervision,

E. Zhan, S. Zheng, Y . Yue, L. Sha, and P. Lucey, “Generating multi-agent trajectories using programmatic weak supervision,” inICLR, 2019

work page 2019
[21]

baller2vec++: A look-ahead multi- entity transformer for modeling coordinated agents,

M. A. Alcorn and A. Nguyen, “baller2vec++: A look-ahead multi- entity transformer for modeling coordinated agents,”arXiv preprint arXiv:2104.11980, 2021

work page arXiv 2021
[22]

Entry-flipped transformer for inference and prediction of participant behavior,

B. Hu and T.-J. Cham, “Entry-flipped transformer for inference and prediction of participant behavior,” inECCV, 2022

work page 2022
[23]

Footbots: A transformer-based architecture for motion prediction in soccer,

G. Capellera, L. Ferraz, A. Rubio, A. Agudo, and F. Moreno-Noguer, “Footbots: A transformer-based architecture for motion prediction in soccer,” inICIP, 2024

work page 2024
[24]

Temporally accurate events detection through ball possessor recognition in soccer,

M. Peral, G. Capellera, A. Rubio, L. Ferraz, F. Moreno-Noguer, and A. Agudo, “Temporally accurate events detection through ball possessor recognition in soccer,” inVISAPP, 2025

work page 2025
[25]

Leapfrog diffusion model for stochastic trajectory prediction,

W. Mao, C. Xu, Q. Zhu, S. Chen, and Y . Wang, “Leapfrog diffusion model for stochastic trajectory prediction,” inCVPR, 2023

work page 2023
[26]

Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction,

Y . Xu, A. Bazarjani, H.-g. Chi, C. Choi, and Y . Fu, “Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction,” inCVPR, 2023, pp. 9632–9643

work page 2023
[27]

MG-GAN: A multi- generator model preventing out-of-distribution samples in pedestrian trajectory prediction,

P. Dendorfer, S. Elflein, and L. Leal-Taix ´e, “MG-GAN: A multi- generator model preventing out-of-distribution samples in pedestrian trajectory prediction,” inICCV, 2021

work page 2021
[28]

Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,

Y . Yuan, X. Weng, Y . Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” inICCV, 2021

work page 2021
[29]

Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,

C. Xu, M. Li, Z. Ni, Y . Zhang, and S. Chen, “Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” inCVPR, 2022

work page 2022
[30]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inNeurIPS, 2020

work page 2020
[31]

Stochastic trajectory prediction via motion indeterminacy diffusion,

T. Gu, G. Chen, J. Li, C. Lin, Y . Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” inCVPR, 2022

work page 2022
[32]

Naomi: Non- autoregressive multiresolution sequence imputation,

Y . Liu, R. Yu, S. Zheng, E. Zhan, and Y . Yue, “Naomi: Non- autoregressive multiresolution sequence imputation,” inNeurIPS, 2019

work page 2019
[33]

Imitative non-autoregressive modeling for trajectory forecasting and imputation,

M. Qi, J. Qin, Y . Wu, and Y . Yang, “Imitative non-autoregressive modeling for trajectory forecasting and imputation,” inCVPR, 2020

work page 2020
[34]

Ball trajectory inference from multi-agent sports contexts using set transformer and hierarchical bi-LSTM,

H. Kim, H.-J. Choi, C. J. Kim, J. Yoon, and S.-K. Ko, “Ball trajectory inference from multi-agent sports contexts using set transformer and hierarchical bi-LSTM,”arXiv preprint arXiv:2306.08206, 2023

work page arXiv 2023
[35]

Sports-traj: A unified trajectory generation model for multi-agent movement in sports,

Y . Xu and Y . Fu, “Sports-traj: A unified trajectory generation model for multi-agent movement in sports,” inICLR, 2025

work page 2025
[36]

TranSPORTmer: A holistic approach to trajectory understanding in multi-agent sports,

G. Capellera, L. Ferraz, A. Rubio, A. Agudo, and F. Moreno-Noguer, “TranSPORTmer: A holistic approach to trajectory understanding in multi-agent sports,” inACCV, 2024

work page 2024
[37]

Unified uncertainty- aware diffusion for multi-agent trajectory modeling,

G. Capellera, A. Rubio, L. Ferraz, and A. Agudo, “Unified uncertainty- aware diffusion for multi-agent trajectory modeling,” inCVPR, 2025

work page 2025
[38]

Where will they go? predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders,

P. Felsen, P. Lucey, and S. Ganguly, “Where will they go? predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders,” inECCV, 2018

work page 2018
[39]

arXiv preprint arXiv:1902.09641 , year=

C. Sun, P. Karlsson, J. Wu, J. B. Tenenbaum, and K. Murphy, “Stochastic prediction of multi-agent interactions from partial observations,”arXiv preprint arXiv:1902.09641, 2019

work page arXiv 1902
[40]

Diverse generation for multi-agent sports games,

R. A. Yeh, A. G. Schwing, J. Huang, and K. Murphy, “Diverse generation for multi-agent sports games,” inCVPR, 2019

work page 2019
[41]

TPNet: Trajectory proposal network for motion prediction,

L. Fang, Q. Jiang, J. Shi, and B. Zhou, “TPNet: Trajectory proposal network for motion prediction,” inCVPR, 2020

work page 2020
[42]

Collaborative motion prediction via neural motion message passing,

Y . Hu, S. Chen, Y . Zhang, and X. Gu, “Collaborative motion prediction via neural motion message passing,” inCVPR, 2020

work page 2020
[43]

Sophie: An attentive gan for predicting paths compliant to social and physical constraints,

A. Sadeghian, V . Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “Sophie: An attentive gan for predicting paths compliant to social and physical constraints,” inCVPR, 2019

work page 2019
[44]

It is not the journey but the destination: Endpoint conditioned trajectory prediction,

K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” inECCV, 2020

work page 2020
[45]

Muse-vae: Multi-scale V AE for environment-aware long term trajectory prediction,

M. Lee, S. S. Sohn, S. Moon, S. Yoon, M. Kapadia, and V . Pavlovic, “Muse-vae: Multi-scale V AE for environment-aware long term trajectory prediction,” inCVPR, 2022. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13

work page 2022
[46]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inCVPR, 2023

work page 2023
[47]

Trace and pace: Controllable pedestrian animation via guided trajectory diffusion,

D. Rempe, Z. Luo, X. Bin Peng, Y . Yuan, K. Kitani, K. Kreis, S. Fidler, and O. Litany, “Trace and pace: Controllable pedestrian animation via guided trajectory diffusion,” inCVPR, 2023

work page 2023
[48]

Singulartrajectory: Universal trajec- tory predictor using diffusion model,

I. Bae, Y .-J. Park, and H.-G. Jeon, “Singulartrajectory: Universal trajec- tory predictor using diffusion model,” inCVPR, 2024

work page 2024
[49]

Bcdiff: Bidi- rectional consistent diffusion for instantaneous trajectory prediction,

R. Li, C. Li, D. Ren, G. Chen, Y . Yuan, and G. Wang, “Bcdiff: Bidi- rectional consistent diffusion for instantaneous trajectory prediction,” in NeurIPS, 2023

work page 2023
[50]

Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following,

B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following,” inCVPR, 2024

work page 2024
[51]

Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation,

Y . Fu, Q. Yan, L. Wang, K. Li, and R. Liao, “Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation,” inCVPR, 2025

work page 2025
[52]

MART: Multiscale relational transformer networks for multi-agent trajectory prediction,

S. Lee, J. Lee, Y . Yu, T. Kim, and K. Lee, “MART: Multiscale relational transformer networks for multi-agent trajectory prediction,” inECCV, 2024

work page 2024
[53]

Multi-transmotion: Pre-trained model for human motion prediction,

Y . Gao, P.-C. Luan, and A. Alahi, “Multi-transmotion: Pre-trained model for human motion prediction,” inCoRL, 2024

work page 2024
[54]

Multiagent off-screen behavior prediction in football,

S. Omidshafiei, D. Hennes, M. Garnelo, Z. Wang, A. Recasens, E. Tarassov, Y . Yang, R. Elie, J. T. Connor, P. Mulleret al., “Multiagent off-screen behavior prediction in football,”Scientific reports, 2022

work page 2022
[55]

Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” in NeurIPS, 2021

work page 2021
[56]

Diffusion-based time series impu- tation and forecasting with structured state space models,

J. M. L. Alcaraz and N. Strodthoff, “Diffusion-based time series impu- tation and forecasting with structured state space models,”TMLR, 2022

work page 2022
[57]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inNeurIPS, 2017

work page 2017
[58]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[59]

Collaborative uncertainty benefits multi-agent multi- modal trajectory forecasting,

B. Tang, Y . Zhong, C. Xu, W.-T. Wu, U. Neumann, Y . Zhang, S. Chen, and Y . Wang, “Collaborative uncertainty benefits multi-agent multi- modal trajectory forecasting,”TPAMI, 2023

work page 2023
[60]

Toward reliable human pose forecasting with uncertainty,

S. Saadatnejad, M. Mirmohammadi, M. Daghyani, P. Saremi, Y . Z. Benisi, A. Alimohammadi, Z. Tehraninasab, T. Mordan, and A. Alahi, “Toward reliable human pose forecasting with uncertainty,”RA-L, 2024

work page 2024
[61]

Uncertainty-aware trajectory prediction via rule-regularized het- eroscedastic deep classification,

K. Manas, C. Schlauch, A. Paschke, C. Wirth, and N. Klein, “Uncertainty-aware trajectory prediction via rule-regularized het- eroscedastic deep classification,” inRSS, 2025

work page 2025
[62]

Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference,

S. Kou, L. Gan, D. Wang, C. Li, and Z. Deng, “Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference,” inICLR, 2024

work page 2024
[63]

Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,

Y . Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,” in CoRL, 2019

work page 2019
[64]

Covernet: Multimodal behavior prediction using trajectory sets,

T. Phan-Minh, E. C. Grigore, F. A. Boulton, O. Beijbom, and E. M. Wolff, “Covernet: Multimodal behavior prediction using trajectory sets,” inCVPR, 2020

work page 2020
[65]

Trajectory unified transformer for pedestrian trajectory prediction,

L. Shi, L. Wang, S. Zhou, and G. Hua, “Trajectory unified transformer for pedestrian trajectory prediction,” inICCV, 2023

work page 2023
[66]

TNT: Target-driven trajectory prediction,

H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y . Shen, Y . Shen, Y . Chai, C. Schmidet al., “TNT: Target-driven trajectory prediction,” inCoRL, 2021

work page 2021
[67]

Motion transformer with global intention localization and local movement refinement,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,” inNeurIPS, 2022

work page 2022
[68]

Trajflow: Multi-modal motion prediction via flow matching,

Q. Yan, B. Zhang, Y . Zhang, D. Yang, J. White, D. Chen, J. Liu, L. Liu, B. Zhuang, S. Shiet al., “Trajflow: Multi-modal motion prediction via flow matching,”arXiv preprint arXiv:2506.08541, 2025

work page arXiv 2025
[69]

Score-based generative modeling through stochastic differ- ential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inICLR, 2021

work page 2021
[70]

Improved denoising diffusion proba- bilistic models,

A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion proba- bilistic models,” inICML, 2021

work page 2021
[71]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inICLR, 2021

work page 2021
[72]

BERT: Pre- training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inNAACL-HLT, 2019

work page 2019
[73]

Fast differentiable sorting and ranking,

M. Blondel, O. Teboul, Q. Berthet, and J. Djolonga, “Fast differentiable sorting and ranking,” inICML, 2020

work page 2020
[74]

Dag-net: Double attentive graph neural network for trajectory forecasting,

A. Monti, A. Bertugli, S. Calderara, and R. Cucchiara, “Dag-net: Double attentive graph neural network for trajectory forecasting,” inICPR, 2021

work page 2021
[75]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, 1997

work page 1997
[76]

Remember intentions: Retrospective-memory-based trajectory prediction,

C. Xu, W. Mao, W. Zhang, and S. Chen, “Remember intentions: Retrospective-memory-based trajectory prediction,” inCVPR, 2022

work page 2022
[77]

Non-probability sampling network for stochastic human trajectory prediction,

I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” inCVPR, 2022. Guillem Capellerareceived the B.Sc. degree in Sports Science in 2017, B.Sc. degree in Mathematics and Physics in 2021, from University of Barcelona (UB); he then earned an M.Sc. degree in Computer Vision from Autonomous Universi...

work page 2022