pith. machine review for the scientific record. sign in

arxiv: 2605.10717 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.CV

Recognition: no theorem link

Heteroscedastic Diffusion for Multi-Agent Trajectory Modeling

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords heteroscedastic diffusionmulti-agent trajectory modelingtrajectory completionuncertainty estimationdiffusion modelstrajectory forecastingsports analytics
0
0 comments X

The pith

A diffusion model unifies trajectory completion and forecasting while estimating state-specific uncertainty and ranking predictions by error likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work seeks to extend diffusion models beyond pure forecasting to also handle trajectory completion, a task needed when tracking data has gaps. It adds state-wise heteroscedastic uncertainty by changing the training loss and moving uncertainty from the model's internal space to actual positions via a linear approximation. A separate ranking network then assigns error probabilities to each generated scene so users can pick the most reliable one. This matters for applications like sports analytics or robotics where incomplete paths and unreliable predictions are common. The resulting method is shown to beat prior approaches on completion and forecasting tasks across several multi-agent sports datasets.

Core claim

U2Diffine augments the standard denoising objective with the negative log-likelihood of the predicted noise to learn heteroscedastic uncertainty, propagates that uncertainty to state space with a first-order Taylor expansion, and pairs the model with a RankNN that predicts per-mode error probabilities, enabling both completion and forecasting with uncertainty awareness.

What carries the argument

The augmented denoising loss combined with first-order Taylor propagation of latent uncertainty, plus the RankNN for mode ranking.

If this is right

  • Trajectory completion becomes feasible in the same framework as forecasting, directly addressing incomplete tracking data.
  • State-wise uncertainty estimates become available for every agent position and velocity without extra sampling.
  • Generated scenes can be ranked at inference time by their predicted error probability instead of relying on likelihood alone.
  • A faster sampling variant achieves the same speed as ordinary generative diffusion while retaining the uncertainty capability.
  • Performance gains appear consistently on four distinct sports datasets covering basketball, football, and soccer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same uncertainty propagation step could be tested on non-sports multi-agent settings such as pedestrian crowds or vehicle fleets to check if the linear approximation still holds.
  • Error-probability ranking might be combined with downstream decision modules that choose actions only when uncertainty falls below a threshold.
  • If the ranking network generalizes, it could be attached to other generative models to turn raw samples into ordered, confidence-labeled outputs.

Load-bearing premise

The linear approximation that converts uncertainty inside the model into uncertainty in actual positions and velocities remains accurate enough without large errors from ignored higher-order effects on these movement patterns.

What would settle it

If the reported uncertainty values show low or negative correlation with actual squared errors on new multi-agent sequences, or if recomputing uncertainties with second-order terms in the expansion produces markedly different results.

Figures

Figures reproduced from arXiv: 2605.10717 by Antonio Agudo, Antonio Rubio, Guillem Capellera, Luis Ferraz.

Figure 1
Figure 1. Figure 1: Uncertainty-aware, unified and interpretable approach for trajectory modeling in multi-agent scenarios. U2Diffine/U2Diff is a diffusion-based model capable of performing trajectory completion tasks such as forecasting, imputation or inferring totally unseen agents, while also jointly estimating state-wise uncertainty. RankNN is a post-processing operation that infers an error probability for each generated… view at source ↗
Figure 2
Figure 2. Figure 2: Reverse Gaussian Sampling. Accuracy Rate (AccRate) and Negative Log-Likelihood (NLL) results under different configurations of the Reverse Gaussian Sampling procedure. Specifically, the full Jacobian can be ill-conditioned or contain large off-diagonal elements, introducing numerical instabilities that result in non-positive-definite covariance estimates. To ensure robust and positive-definite covariance e… view at source ↗
Figure 3
Figure 3. Figure 3: U2Diffine architecture. The input consists of noisy trajectories (Xs), observations (Xco), a mask (M), and the current denoising step (s). These inputs are embedded and processed through two Residual Denoising Blocks. Within these blocks, a Social-temporal Block combines a Temporal Mamba and a Social Transformer to effectively model complex temporal and social dynamics. Finally the skip connection outputs,… view at source ↗
Figure 4
Figure 4. Figure 4: RankNN architecture. The model’s input for each generated scene consists of the mean (X0), the square roots of the covariance (Var(X 0)) eigenvalues, and the binary mask (M). These are embedded and processed through a Social-temporal Block. After that, it creates scene-level embeddings by averaging over time and agents. These embeddings are then fed into a Multi-scene Transformer to account for dependencie… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons in trajectory completion (top) and forecasting (bottom). Top: Comparison with UniTraj [35] on Football-U and Basketball￾U datasets, and Ours-20 is the corresponding distribution over 20 generated modes. Bottom: Comparison against AutoBots [7], LED [25] and MoFlow [51] on NBA trajectory forecasting. Ground truth player locations are shown in bright blue and pink, and the ball in gree… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative evaluation of the error correlation. Top: In orange, the AvgUcty versus SADE across the 20 generated modes using U2Diffine of a test scene example. In blue, the error probability e versus SADE. Bottom: Distribution of Spearman correlation coefficients ρ for all four test datasets, using AvgUcty in orange and RankNN predicting e in blue. TABLE III ABLATION STUDY ON U2DIFF/U2DIFFINE ARCHITECTURE … view at source ↗
read the original abstract

Multi-agent trajectory modeling traditionally focuses on forecasting, often neglecting more general tasks like trajectory completion, which is essential for real-world applications such as correcting tracking data. Existing methods also generally predict agents' states without offering any state-wise measure of heteroscedastic uncertainty. Moreover, popular multi-modal sampling methods lack error probability estimates for each generated scene under the same prior observations, which makes it difficult to rank the predictions at inference time. We introduce U2Diffine, a unified diffusion model built to perform trajectory completion while simultaneously offering state-wise heteroscedastic uncertainty estimates. This is achieved by augmenting the standard denoising loss with the negative log-likelihood of the predicted noise, and then propagating the latent space uncertainty to the real state space using a first-order Taylor approximation. We also propose U2Diff, a faster baseline that avoids gradient computation during sampling. This approach significantly increases inference speed, making it as efficient as a standard generative-only diffusion model. For post-processing, we integrate a Rank Neural Network (RankNN) that enables error probability estimation for each generated mode, demonstrating strong correlation with ground truth errors. Our method outperforms state-of-the-art solutions in both trajectory completion and forecasting across four challenging sports datasets (NBA, Basketball-U, Football-U, Soccer-U), underscoring the effectiveness of our uncertainty and error probability estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents U2Diffine, a unified diffusion model for multi-agent trajectory completion and forecasting that provides state-wise heteroscedastic uncertainty estimates. This is accomplished by augmenting the denoising objective with negative log-likelihood and propagating latent uncertainty via a first-order Taylor approximation. Additionally, U2Diff is proposed as a faster baseline, and a Rank Neural Network (RankNN) is used for estimating error probabilities of generated modes. The method is evaluated on four sports datasets (NBA, Basketball-U, Football-U, Soccer-U), claiming outperformance over state-of-the-art in both tasks.

Significance. Should the uncertainty estimates prove reliable and the performance gains hold under rigorous validation, this work could contribute meaningfully to multi-agent modeling by addressing the lack of uncertainty quantification in trajectory prediction. The error probability ranking adds practical value for selecting among multi-modal predictions.

major comments (2)
  1. [Abstract] The central claim of outperforming SOTA on completion and forecasting rests on reliable state-wise heteroscedastic uncertainty, but the abstract reports outperformance without quantitative tables, ablation results, or error-bar details (see reader's take on soundness).
  2. [Method (uncertainty propagation step)] The first-order Taylor approximation used to propagate latent-space uncertainty to real state space (described in the abstract) omits higher-order terms in the denoising network's Jacobian; in multi-agent sports data with velocities and interactions producing locally non-linear mappings, this can bias uncertainty magnitudes and downstream RankNN scores. Validation of approximation accuracy or comparison to Monte Carlo sampling on the target datasets is required to substantiate the gains.
minor comments (2)
  1. [Abstract] The acronyms U2Diffine and U2Diff are introduced without clear expansion or distinction in the abstract.
  2. [Experiments] Ensure consistent dataset naming and full citations (e.g., for NBA, Basketball-U) appear in the experimental section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and outline planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The central claim of outperforming SOTA on completion and forecasting rests on reliable state-wise heteroscedastic uncertainty, but the abstract reports outperformance without quantitative tables, ablation results, or error-bar details (see reader's take on soundness).

    Authors: The abstract is a concise summary of contributions and findings, as is standard; detailed quantitative tables, ablation studies on the uncertainty components, and error bars from multiple runs appear in the Experiments section of the full manuscript. The outperformance claims are substantiated by those results. We can revise the abstract to reference specific quantitative gains if space allows. revision: partial

  2. Referee: [Method (uncertainty propagation step)] The first-order Taylor approximation used to propagate latent-space uncertainty to real state space (described in the abstract) omits higher-order terms in the denoising network's Jacobian; in multi-agent sports data with velocities and interactions producing locally non-linear mappings, this can bias uncertainty magnitudes and downstream RankNN scores. Validation of approximation accuracy or comparison to Monte Carlo sampling on the target datasets is required to substantiate the gains.

    Authors: We acknowledge that the first-order Taylor approximation is an efficiency-driven choice that may not fully capture higher-order non-linear effects from agent interactions. In the revised manuscript we will add a direct comparison of the first-order approximation against Monte Carlo sampling on the NBA and Soccer-U datasets to quantify approximation error and confirm suitability for the reported uncertainty estimates and RankNN scores. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces U2Diffine by augmenting the denoising loss with NLL of predicted noise and propagating latent uncertainty via first-order Taylor approximation, plus a separate RankNN for mode ranking. These are presented as independent methodological additions rather than re-derivations of fitted quantities or self-referential definitions. Performance gains are reported as empirical results on external sports datasets (NBA, Basketball-U, etc.), with no load-bearing steps that reduce by construction to inputs, self-citations, or renamed known results. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond standard diffusion assumptions and the first-order Taylor step.

pith-pipeline@v0.9.0 · 5537 in / 1061 out tokens · 27302 ms · 2026-05-12T05:11:31.022356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

  1. [1]

    Social LSTM: Human trajectory prediction in crowded spaces,

    A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” inCVPR, 2016

  2. [2]

    Social GAN: Socially acceptable trajectories with generative adversarial net- works,

    A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial net- works,” inCVPR, 2018

  3. [3]

    Social ways: Learning multi- modal distributions of pedestrian trajectories with gans,

    J. Amirian, J.-B. Hayet, and J. Pettr ´e, “Social ways: Learning multi- modal distributions of pedestrian trajectories with gans,” inCVPRW, 2019

  4. [4]

    Social-bigat: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,

    V . Kosaraju, A. Sadeghian, R. Mart ´ın-Mart´ın, I. Reid, H. Rezatofighi, and S. Savarese, “Social-bigat: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,” inNeurIPS, 2019

  5. [5]

    Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control,

    T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Multi-agent generative trajectory forecasting with heterogeneous data for control,” inECCV, 2020

  6. [6]

    Scene transformer: A unified architecture for predicting multiple agent trajectories,

    J. Ngiam, B. Caine, V . Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopalet al., “Scene transformer: A unified architecture for predicting multiple agent trajectories,” in ICLR, 2022

  7. [7]

    Latent variable sequential set transformers for joint multi-agent motion prediction,

    R. Girgis, F. Golemo, F. Codevilla, M. Weiss, J. A. D’Souza, S. E. Kahou, F. Heide, and C. Pal, “Latent variable sequential set transformers for joint multi-agent motion prediction,” inICLR, 2022

  8. [8]

    Social-patteRNN: Socially-aware trajectory prediction guided by motion patterns,

    I. Navarro and J. Oh, “Social-patteRNN: Socially-aware trajectory prediction guided by motion patterns,” inIROS, 2022

  9. [9]

    Social- transmotion: Promptable human trajectory prediction,

    S. Saadatnejad, Y . Gao, K. Messaoud, and A. Alahi, “Social- transmotion: Promptable human trajectory prediction,” inICLR, 2024

  10. [10]

    Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,

    C. Xu, R. T. Tan, Y . Tan, S. Chen, Y . G. Wang, X. Wang, and Y . Wang, “Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,” inCVPR, 2023

  11. [11]

    Recurrent network models for human dynamics,

    K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent network models for human dynamics,” inICCV, 2015

  12. [12]

    Structural-RNN: Deep learning on spatio-temporal graphs,

    A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep learning on spatio-temporal graphs,” inCVPR, 2016

  13. [13]

    On human motion prediction using recurrent neural networks,

    J. Martinez, M. J. Black, and J. Romero, “On human motion prediction using recurrent neural networks,” inCVPR, 2017

  14. [14]

    Learning trajectory depen- dencies for human motion prediction,

    W. Mao, M. Liu, M. Salzmann, and H. Li, “Learning trajectory depen- dencies for human motion prediction,” inICCV, 2019

  15. [15]

    History repeats itself: Human motion prediction via motion attention,

    W. Mao, M. Liu, and M. Salzmann, “History repeats itself: Human motion prediction via motion attention,” inECCV, 2020

  16. [16]

    A spatio-temporal transformer for 3D human motion prediction,

    E. Aksan, M. Kaufmann, P. Cao, and O. Hilliges, “A spatio-temporal transformer for 3D human motion prediction,” in3DV, 2021

  17. [17]

    Learning progressive joint propagation for human motion prediction,

    Y . Cai, L. Huang, Y . Wang, T.-J. Cham, J. Cai, J. Yuan, J. Liu, X. Yang, Y . Zhu, X. Shenet al., “Learning progressive joint propagation for human motion prediction,” inECCV, 2020

  18. [18]

    Back to MLP: A simple baseline for human motion prediction,

    W. Guo, Y . Du, X. Shen, V . Lepetit, X. Alameda-Pineda, and F. Moreno- Noguer, “Back to MLP: A simple baseline for human motion prediction,” inWACV, 2023

  19. [19]

    Generating long-term trajectories using deep hierarchical networks,

    S. Zheng, Y . Yue, and J. Hobbs, “Generating long-term trajectories using deep hierarchical networks,” inNeurIPS, 2016

  20. [20]

    Generating multi-agent trajectories using programmatic weak supervision,

    E. Zhan, S. Zheng, Y . Yue, L. Sha, and P. Lucey, “Generating multi-agent trajectories using programmatic weak supervision,” inICLR, 2019

  21. [21]

    baller2vec++: A look-ahead multi- entity transformer for modeling coordinated agents,

    M. A. Alcorn and A. Nguyen, “baller2vec++: A look-ahead multi- entity transformer for modeling coordinated agents,”arXiv preprint arXiv:2104.11980, 2021

  22. [22]

    Entry-flipped transformer for inference and prediction of participant behavior,

    B. Hu and T.-J. Cham, “Entry-flipped transformer for inference and prediction of participant behavior,” inECCV, 2022

  23. [23]

    Footbots: A transformer-based architecture for motion prediction in soccer,

    G. Capellera, L. Ferraz, A. Rubio, A. Agudo, and F. Moreno-Noguer, “Footbots: A transformer-based architecture for motion prediction in soccer,” inICIP, 2024

  24. [24]

    Temporally accurate events detection through ball possessor recognition in soccer,

    M. Peral, G. Capellera, A. Rubio, L. Ferraz, F. Moreno-Noguer, and A. Agudo, “Temporally accurate events detection through ball possessor recognition in soccer,” inVISAPP, 2025

  25. [25]

    Leapfrog diffusion model for stochastic trajectory prediction,

    W. Mao, C. Xu, Q. Zhu, S. Chen, and Y . Wang, “Leapfrog diffusion model for stochastic trajectory prediction,” inCVPR, 2023

  26. [26]

    Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction,

    Y . Xu, A. Bazarjani, H.-g. Chi, C. Choi, and Y . Fu, “Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction,” inCVPR, 2023, pp. 9632–9643

  27. [27]

    MG-GAN: A multi- generator model preventing out-of-distribution samples in pedestrian trajectory prediction,

    P. Dendorfer, S. Elflein, and L. Leal-Taix ´e, “MG-GAN: A multi- generator model preventing out-of-distribution samples in pedestrian trajectory prediction,” inICCV, 2021

  28. [28]

    Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,

    Y . Yuan, X. Weng, Y . Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” inICCV, 2021

  29. [29]

    Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,

    C. Xu, M. Li, Z. Ni, Y . Zhang, and S. Chen, “Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” inCVPR, 2022

  30. [30]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inNeurIPS, 2020

  31. [31]

    Stochastic trajectory prediction via motion indeterminacy diffusion,

    T. Gu, G. Chen, J. Li, C. Lin, Y . Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” inCVPR, 2022

  32. [32]

    Naomi: Non- autoregressive multiresolution sequence imputation,

    Y . Liu, R. Yu, S. Zheng, E. Zhan, and Y . Yue, “Naomi: Non- autoregressive multiresolution sequence imputation,” inNeurIPS, 2019

  33. [33]

    Imitative non-autoregressive modeling for trajectory forecasting and imputation,

    M. Qi, J. Qin, Y . Wu, and Y . Yang, “Imitative non-autoregressive modeling for trajectory forecasting and imputation,” inCVPR, 2020

  34. [34]

    Ball trajectory inference from multi-agent sports contexts using set transformer and hierarchical bi-LSTM,

    H. Kim, H.-J. Choi, C. J. Kim, J. Yoon, and S.-K. Ko, “Ball trajectory inference from multi-agent sports contexts using set transformer and hierarchical bi-LSTM,”arXiv preprint arXiv:2306.08206, 2023

  35. [35]

    Sports-traj: A unified trajectory generation model for multi-agent movement in sports,

    Y . Xu and Y . Fu, “Sports-traj: A unified trajectory generation model for multi-agent movement in sports,” inICLR, 2025

  36. [36]

    TranSPORTmer: A holistic approach to trajectory understanding in multi-agent sports,

    G. Capellera, L. Ferraz, A. Rubio, A. Agudo, and F. Moreno-Noguer, “TranSPORTmer: A holistic approach to trajectory understanding in multi-agent sports,” inACCV, 2024

  37. [37]

    Unified uncertainty- aware diffusion for multi-agent trajectory modeling,

    G. Capellera, A. Rubio, L. Ferraz, and A. Agudo, “Unified uncertainty- aware diffusion for multi-agent trajectory modeling,” inCVPR, 2025

  38. [38]

    Where will they go? predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders,

    P. Felsen, P. Lucey, and S. Ganguly, “Where will they go? predicting fine-grained adversarial multi-agent motion using conditional variational autoencoders,” inECCV, 2018

  39. [39]

    arXiv preprint arXiv:1902.09641 , year=

    C. Sun, P. Karlsson, J. Wu, J. B. Tenenbaum, and K. Murphy, “Stochastic prediction of multi-agent interactions from partial observations,”arXiv preprint arXiv:1902.09641, 2019

  40. [40]

    Diverse generation for multi-agent sports games,

    R. A. Yeh, A. G. Schwing, J. Huang, and K. Murphy, “Diverse generation for multi-agent sports games,” inCVPR, 2019

  41. [41]

    TPNet: Trajectory proposal network for motion prediction,

    L. Fang, Q. Jiang, J. Shi, and B. Zhou, “TPNet: Trajectory proposal network for motion prediction,” inCVPR, 2020

  42. [42]

    Collaborative motion prediction via neural motion message passing,

    Y . Hu, S. Chen, Y . Zhang, and X. Gu, “Collaborative motion prediction via neural motion message passing,” inCVPR, 2020

  43. [43]

    Sophie: An attentive gan for predicting paths compliant to social and physical constraints,

    A. Sadeghian, V . Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “Sophie: An attentive gan for predicting paths compliant to social and physical constraints,” inCVPR, 2019

  44. [44]

    It is not the journey but the destination: Endpoint conditioned trajectory prediction,

    K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” inECCV, 2020

  45. [45]

    Muse-vae: Multi-scale V AE for environment-aware long term trajectory prediction,

    M. Lee, S. S. Sohn, S. Moon, S. Yoon, M. Kapadia, and V . Pavlovic, “Muse-vae: Multi-scale V AE for environment-aware long term trajectory prediction,” inCVPR, 2022. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13

  46. [46]

    Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

    C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inCVPR, 2023

  47. [47]

    Trace and pace: Controllable pedestrian animation via guided trajectory diffusion,

    D. Rempe, Z. Luo, X. Bin Peng, Y . Yuan, K. Kitani, K. Kreis, S. Fidler, and O. Litany, “Trace and pace: Controllable pedestrian animation via guided trajectory diffusion,” inCVPR, 2023

  48. [48]

    Singulartrajectory: Universal trajec- tory predictor using diffusion model,

    I. Bae, Y .-J. Park, and H.-G. Jeon, “Singulartrajectory: Universal trajec- tory predictor using diffusion model,” inCVPR, 2024

  49. [49]

    Bcdiff: Bidi- rectional consistent diffusion for instantaneous trajectory prediction,

    R. Li, C. Li, D. Ren, G. Chen, Y . Yuan, and G. Wang, “Bcdiff: Bidi- rectional consistent diffusion for instantaneous trajectory prediction,” in NeurIPS, 2023

  50. [50]

    Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following,

    B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following,” inCVPR, 2024

  51. [51]

    Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation,

    Y . Fu, Q. Yan, L. Wang, K. Li, and R. Liao, “Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation,” inCVPR, 2025

  52. [52]

    MART: Multiscale relational transformer networks for multi-agent trajectory prediction,

    S. Lee, J. Lee, Y . Yu, T. Kim, and K. Lee, “MART: Multiscale relational transformer networks for multi-agent trajectory prediction,” inECCV, 2024

  53. [53]

    Multi-transmotion: Pre-trained model for human motion prediction,

    Y . Gao, P.-C. Luan, and A. Alahi, “Multi-transmotion: Pre-trained model for human motion prediction,” inCoRL, 2024

  54. [54]

    Multiagent off-screen behavior prediction in football,

    S. Omidshafiei, D. Hennes, M. Garnelo, Z. Wang, A. Recasens, E. Tarassov, Y . Yang, R. Elie, J. T. Connor, P. Mulleret al., “Multiagent off-screen behavior prediction in football,”Scientific reports, 2022

  55. [55]

    Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

    Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” in NeurIPS, 2021

  56. [56]

    Diffusion-based time series impu- tation and forecasting with structured state space models,

    J. M. L. Alcaraz and N. Strodthoff, “Diffusion-based time series impu- tation and forecasting with structured state space models,”TMLR, 2022

  57. [57]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inNeurIPS, 2017

  58. [58]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

  59. [59]

    Collaborative uncertainty benefits multi-agent multi- modal trajectory forecasting,

    B. Tang, Y . Zhong, C. Xu, W.-T. Wu, U. Neumann, Y . Zhang, S. Chen, and Y . Wang, “Collaborative uncertainty benefits multi-agent multi- modal trajectory forecasting,”TPAMI, 2023

  60. [60]

    Toward reliable human pose forecasting with uncertainty,

    S. Saadatnejad, M. Mirmohammadi, M. Daghyani, P. Saremi, Y . Z. Benisi, A. Alimohammadi, Z. Tehraninasab, T. Mordan, and A. Alahi, “Toward reliable human pose forecasting with uncertainty,”RA-L, 2024

  61. [61]

    Uncertainty-aware trajectory prediction via rule-regularized het- eroscedastic deep classification,

    K. Manas, C. Schlauch, A. Paschke, C. Wirth, and N. Klein, “Uncertainty-aware trajectory prediction via rule-regularized het- eroscedastic deep classification,” inRSS, 2025

  62. [62]

    Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference,

    S. Kou, L. Gan, D. Wang, C. Li, and Z. Deng, “Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference,” inICLR, 2024

  63. [63]

    Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,

    Y . Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,” in CoRL, 2019

  64. [64]

    Covernet: Multimodal behavior prediction using trajectory sets,

    T. Phan-Minh, E. C. Grigore, F. A. Boulton, O. Beijbom, and E. M. Wolff, “Covernet: Multimodal behavior prediction using trajectory sets,” inCVPR, 2020

  65. [65]

    Trajectory unified transformer for pedestrian trajectory prediction,

    L. Shi, L. Wang, S. Zhou, and G. Hua, “Trajectory unified transformer for pedestrian trajectory prediction,” inICCV, 2023

  66. [66]

    TNT: Target-driven trajectory prediction,

    H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y . Shen, Y . Shen, Y . Chai, C. Schmidet al., “TNT: Target-driven trajectory prediction,” inCoRL, 2021

  67. [67]

    Motion transformer with global intention localization and local movement refinement,

    S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,” inNeurIPS, 2022

  68. [68]

    Trajflow: Multi-modal motion prediction via flow matching,

    Q. Yan, B. Zhang, Y . Zhang, D. Yang, J. White, D. Chen, J. Liu, L. Liu, B. Zhuang, S. Shiet al., “Trajflow: Multi-modal motion prediction via flow matching,”arXiv preprint arXiv:2506.08541, 2025

  69. [69]

    Score-based generative modeling through stochastic differ- ential equations,

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inICLR, 2021

  70. [70]

    Improved denoising diffusion proba- bilistic models,

    A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion proba- bilistic models,” inICML, 2021

  71. [71]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inICLR, 2021

  72. [72]

    BERT: Pre- training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inNAACL-HLT, 2019

  73. [73]

    Fast differentiable sorting and ranking,

    M. Blondel, O. Teboul, Q. Berthet, and J. Djolonga, “Fast differentiable sorting and ranking,” inICML, 2020

  74. [74]

    Dag-net: Double attentive graph neural network for trajectory forecasting,

    A. Monti, A. Bertugli, S. Calderara, and R. Cucchiara, “Dag-net: Double attentive graph neural network for trajectory forecasting,” inICPR, 2021

  75. [75]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, 1997

  76. [76]

    Remember intentions: Retrospective-memory-based trajectory prediction,

    C. Xu, W. Mao, W. Zhang, and S. Chen, “Remember intentions: Retrospective-memory-based trajectory prediction,” inCVPR, 2022

  77. [77]

    Non-probability sampling network for stochastic human trajectory prediction,

    I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” inCVPR, 2022. Guillem Capellerareceived the B.Sc. degree in Sports Science in 2017, B.Sc. degree in Mathematics and Physics in 2021, from University of Barcelona (UB); he then earned an M.Sc. degree in Computer Vision from Autonomous Universi...