LAMP: Lane-Aligned Motion Primitives for Feasible Trajectory Prediction

Changhyun Choi; H. Jin Kim; Hoseong Jung; Jeongtae Her; Sangjin Han

arxiv: 2606.26661 · v1 · pith:KRGKATKNnew · submitted 2026-06-25 · 💻 cs.RO · cs.AI

LAMP: Lane-Aligned Motion Primitives for Feasible Trajectory Prediction

Sangjin Han , Hoseong Jung , Jeongtae Her , Changhyun Choi , H. Jin Kim This is my paper

Pith reviewed 2026-06-26 05:18 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords motion forecastingtrajectory predictionautonomous drivinglane topologyVQ-VAEmotion primitivesfeasibility

0 comments

The pith

LAMP anchors multimodal predictions to lane-aligned motion primitives learned by VQ-VAE to raise feasibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LAMP to fix the common problem that multimodal motion predictors produce trajectories violating lane topology, especially low-probability modes. It learns shape-aware motion primitives as discrete intention queries inside a VQ-VAE that capture spatiotemporal patterns aligned with lanes. A feasibility-aware selector trained on a lane-topology prior then filters unreachable queries before decoding, keeping behavioral diversity while enforcing valid paths. On the Argoverse 2 dataset the method matches baseline accuracy yet improves feasibility and diversity scores.

Core claim

LAMP anchors multimodal prediction to structured motion primitives aligned with lane topology by using a VQ-VAE to learn discrete intention queries that capture spatiotemporal patterns beyond endpoints and by training a feasibility-aware intention selector with a lane-topology prior that filters unreachable queries, guiding the decoder to produce topology-consistent yet diverse predictions.

What carries the argument

Lane-aligned motion primitives as discrete codes from a VQ-VAE, filtered by a feasibility-aware intention selector that uses a lane-topology prior.

If this is right

Predicted trajectories violate fewer physical and logical lane constraints.
Prediction sets become more reliable inputs for downstream safety-critical planning.
Diversity is preserved because the selector does not remove all low-probability but reachable modes.
Accuracy on standard displacement metrics remains comparable to existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Planners might need fewer separate post-hoc feasibility filters when using these predictions.
The same discrete primitive approach could transfer to other structured prediction domains with topology constraints.
Long-horizon consistency might improve because early choices are already lane-constrained.

Load-bearing premise

The lane-topology prior correctly identifies unreachable intention queries without discarding behaviorally important modes.

What would settle it

If feasibility and diversity metrics on Argoverse 2 show no gain over baselines while displacement error stays comparable, the performance advantage would not hold.

Figures

Figures reproduced from arXiv: 2606.26661 by Changhyun Choi, H. Jin Kim, Hoseong Jung, Jeongtae Her, Sangjin Han.

**Figure 1.** Figure 1: LAMP improves the reliability of multimodal trajectory predictions. (a) A VQ-VAE provides a discrete set of intention queries; each query is decoded into a motion primitive representing a plausible future trajectory. (b) Infeasible intention queries are removed via lane-topology-guided selection, producing a feasible and diverse set of trajectories for downstream planning. Moreover, prevailing training obj… view at source ↗

**Figure 2.** Figure 2: Learning motion primitives with VQ-VAE [12]. Collected motion trajectories [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the LAMP framework. (a) Scene context is encoded from agent histories and map polylines. (b) A feasibility-guided intention selector evaluates motion primitives and retains L topology-consistent candidates from K hypotheses. The selected intention embeddings and scene features are fed into a Transformer decoder. (c) Multi-head self-attention (MHSA) refines motion queries formed from static inte… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of LAMP and MTR on two intersection scenarios. Each panel visualizes multimodal predictions with confidence scores pk, along with agent history, ground-truth future, and final position. Scenario 1 highlights improved diversity of LAMP, while Scenario 2 shows improved feasibility by reducing off-road predictions compared to MTR. B. Main Results Table I presents the quantitative result… view at source ↗

read the original abstract

Motion forecasting is essential for autonomous driving systems to enable safe decision-making and planning in complex driving scenarios. While existing predictors excel at minimizing standard displacement errors, they often overlook the adherence to lane topology of multimodal predictions, particularly for lower-probability modes. Consequently, predicted trajectories may violate physical and logical constraints, making the prediction set unreliable for safety-critical planning. In this paper, we propose LAMP (Lane-Aligned Motion Primitives), a topology-aware forecasting framework that anchors multimodal prediction to structured motion primitives aligned with lane topology. Specifically, we use a VQ-VAE to learn shape-aware motion primitives as discrete intention queries, capturing spatiotemporal patterns beyond endpoint-based intentions. We further introduce a feasibility-aware intention selector trained with a lane-topology prior for filtering unreachable intention queries, guiding the decoder to prioritize topology-consistent intentions while preserving behavioral diversity. Extensive experiments on the Argoverse 2 dataset demonstrate that LAMP achieves prediction accuracy comparable to state-of-the-art baselines while outperforming them in feasibility and diversity metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LAMP pairs VQ-VAE shape primitives with a lane-topology selector to push feasibility in multimodal forecasts, but the abstract supplies no tables or ablations so the gains remain unverified.

read the letter

The paper's main move is to replace endpoint-style intentions with VQ-VAE-derived motion primitives that encode trajectory shape, then route those primitives through a learned selector that drops ones the lane graph marks unreachable. This is presented as a way to keep diversity while cutting down on physically invalid modes that planners would reject.

The VQ-VAE step is the clearest addition. Learning a discrete codebook over full trajectories rather than just endpoints can in principle capture lane-change curvature or acceleration profiles that endpoint methods miss. Adding an explicit topology prior at the selector stage is a straightforward way to inject map information without forcing the decoder to learn it from scratch.

The soft spot is the missing evidence. The abstract states comparable accuracy plus better feasibility and diversity on Argoverse 2, yet gives no numbers, no baseline tables, no ablation on codebook size, and no description of how the selector threshold was set. Without those details it is impossible to judge whether the reported feasibility lift comes from the model learning better behaviors or simply from the prior discarding modes by construction. The stress-test worry about over-filtering valid off-lane maneuvers therefore lands; if the lane graph is noisy or incomplete, the selector could quietly remove legitimate cut-ins or merges and then claim credit for higher feasibility.

The work is aimed at researchers building predictors that must feed directly into planning modules. Anyone already using discrete latent representations for trajectories would see the VQ-VAE component as worth examining. The paper shows clear thinking about the feasibility gap in current multimodal predictors, so it is coherent on its own terms even if the quantitative support is thin.

I would send it to review. The idea is concrete enough that referees can ask for the missing tables and ablations rather than reject outright.

Referee Report

3 major / 2 minor

Summary. The paper proposes LAMP, a topology-aware motion forecasting framework that learns shape-aware motion primitives via VQ-VAE as discrete intention queries and employs a feasibility-aware intention selector trained with a lane-topology prior to filter unreachable queries before decoding. It claims that this yields prediction accuracy comparable to state-of-the-art baselines while improving feasibility and diversity metrics on the Argoverse 2 dataset.

Significance. If the central claims hold after verification of the prior, the approach could offer a practical way to anchor multimodal predictions to lane structure without sacrificing coverage of relevant behaviors, which would be useful for safety-critical planning. The VQ-VAE component for capturing spatiotemporal patterns beyond endpoints is a potentially reusable idea if ablations confirm its contribution independent of the selector.

major comments (3)

[Experiments] Experiments section: the headline claim of superior feasibility and diversity rests on the lane-topology prior correctly labeling only unreachable intentions; no analysis is provided of the fraction of Argoverse 2 ground-truth trajectories that the prior would classify as unreachable, which is required to rule out over-filtering of valid lane-change or cut-in modes.
[Method] Method section on VQ-VAE and selector: the codebook size and selector threshold are listed as free parameters with no reported ablation or selection procedure; without these the reported gains cannot be reproduced or shown to be robust rather than tuned to the evaluation set.
[Results] Results: quantitative tables, error bars, and per-metric breakdowns comparing LAMP to baselines on accuracy, feasibility, and diversity are referenced in the abstract but the provided manuscript text supplies none, preventing assessment of effect sizes or statistical significance.

minor comments (2)

[Method] Notation for the discrete queries and the selector output probabilities should be defined explicitly in the method section to avoid ambiguity when describing the filtering step.
[Abstract] The abstract states 'extensive experiments' but the manuscript would benefit from a short table summarizing the key hyperparameter choices (codebook size, threshold) even if full ablations are moved to the supplement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful comments and the recommendation for major revision. We will address the concerns regarding the lane-topology prior analysis, hyperparameter ablations, and presentation of quantitative results in the revised manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: the headline claim of superior feasibility and diversity rests on the lane-topology prior correctly labeling only unreachable intentions; no analysis is provided of the fraction of Argoverse 2 ground-truth trajectories that the prior would classify as unreachable, which is required to rule out over-filtering of valid lane-change or cut-in modes.

Authors: We agree that this analysis is necessary to validate the prior. In the revised version, we will add a new subsection or table reporting the percentage of ground-truth trajectories classified as unreachable by the lane-topology prior, with breakdowns by scenario types including lane changes and cut-ins. This will confirm that the prior does not excessively filter valid behaviors while improving feasibility. revision: yes
Referee: [Method] Method section on VQ-VAE and selector: the codebook size and selector threshold are listed as free parameters with no reported ablation or selection procedure; without these the reported gains cannot be reproduced or shown to be robust rather than tuned to the evaluation set.

Authors: We acknowledge the importance of reporting hyperparameter choices. We will include ablations on the codebook size (varying from 64 to 512) and selector threshold in the experiments section of the revision. The selection procedure, based on validation performance for feasibility and diversity without sacrificing accuracy, will also be detailed. revision: yes
Referee: [Results] Results: quantitative tables, error bars, and per-metric breakdowns comparing LAMP to baselines on accuracy, feasibility, and diversity are referenced in the abstract but the provided manuscript text supplies none, preventing assessment of effect sizes or statistical significance.

Authors: The manuscript text does reference the results, but we recognize that detailed tables may not have been sufficiently included or visible. We will ensure the revised manuscript prominently features Tables 1-3 with quantitative comparisons, including error bars (standard deviation over multiple runs) and per-metric breakdowns. Statistical significance tests (e.g., paired t-tests) will be added where appropriate. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper describes a VQ-VAE for discrete motion primitives followed by a feasibility-aware selector that incorporates an external lane-topology prior from map data. No equations, fitted parameters, or self-citations are quoted that reduce the reported feasibility or diversity metrics to the selector's training prior by construction. The prior is presented as an independent input rather than a self-defined quantity, and all metrics are evaluated against the Argoverse 2 ground truth without evidence of tautological re-labeling. This is the normal case of a non-circular architecture paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that lane topology supplies a reliable filter for reachable intentions and on standard unsupervised learning assumptions for VQ-VAE convergence; no new physical entities are postulated.

free parameters (1)

VQ-VAE codebook size
The number of discrete motion primitives must be chosen; its value is not stated in the abstract but directly affects the intention query set.

axioms (1)

domain assumption Lane topology provides a reliable prior for reachable intentions
Invoked when the feasibility-aware intention selector is trained and applied to filter VQ-VAE queries.

invented entities (1)

Lane-aligned motion primitives no independent evidence
purpose: Discrete shape-aware intention queries that capture spatiotemporal patterns beyond endpoints
Learned by VQ-VAE and used as input to the decoder; no independent evidence outside the learned codebook is supplied.

pith-pipeline@v0.9.1-grok · 5715 in / 1409 out tokens · 24805 ms · 2026-06-26T05:18:49.819012+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references

[1]

Motion transformer with global intention localization and local movement refinement,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,”Advances in Neural Information Processing Systems, vol. 35, pp. 6531–6543, 2022

2022
[2]

VectorNet: Encoding HD maps and agent dynamics from vectorized representation,

J. Gao, C. Sun, H. Zhao,et al., “VectorNet: Encoding HD maps and agent dynamics from vectorized representation,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2020, pp. 11525–11533

2020
[3]

Desire: Distant future prediction in dynamic scenes with interacting agents,

N. Lee, W. Choi, P. Vernaza,et al., “Desire: Distant future prediction in dynamic scenes with interacting agents,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 336– 345

2017
[4]

Social GAN: Socially accept- able trajectories with generative adversarial networks,

A. Gupta, J. Johnson, L. Fei-Fei,et al., “Social GAN: Socially accept- able trajectories with generative adversarial networks,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 2255–2264

2018
[5]

Trajec- tron++: Dynamically-feasible trajectory forecasting with heterogeneous data,

T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajec- tron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” inProceedings of the European Conference on Computer Vision, 2020, pp. 683–700

2020
[6]

ChauffeurNet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “ChauffeurNet: Learning to drive by imitating the best and synthesizing the worst,” inProceedings of the Robotics: Science and Systems, 2019

2019
[7]

Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,

Y . Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,” in Proceedings of the Conference on Robot Learning, 2020, pp. 86–99

2020
[8]

CoverNet: Multi- modal behavior prediction using trajectory set,

T. Phan-Minh, E. C. Grigore, F. A. Boulton,et al., “CoverNet: Multi- modal behavior prediction using trajectory set,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2020, pp. 14074–14083

2020
[9]

TNT: Target-driven trajectory predic- tion,

H. Zhao, J. Gao, T. Lan,et al., “TNT: Target-driven trajectory predic- tion,” inProceedings of the Conference on Robot Learning, 2021, pp. 895–904

2021
[10]

DenseTNT: End-to-end trajectory prediction from dense goal sets,

J. Gu, C. Sun, and H. Zhao, “DenseTNT: End-to-end trajectory prediction from dense goal sets,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15303–15312

2021
[11]

MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention query- ing,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention query- ing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3955–3971, 2024

2024
[12]

Neural discrete representation learning,

A. Van Den Oord and O. Vinyals, “Neural discrete representation learning,”Advances in Neural Information Processing Systems, vol. 30, 2017

2017
[13]

Unitraj: A unified framework for scalable vehicle trajectory prediction,

L. Feng, M. Bahari, K. M. B. Amor,et al., “Unitraj: A unified framework for scalable vehicle trajectory prediction,” inProceedings of the European Conference on Computer Vision, 2024, pp. 106–123

2024
[14]

Argoverse 2: Next generation datasets for self-driving perception and forecasting,

B. Wilson, W. Qi, T. Agarwal,et al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,” inProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

2021
[15]

Latent variable sequential set transformers for joint multi-agent motion prediction,

R. Girgis, F. Golemo, F. Codevilla,et al., “Latent variable sequential set transformers for joint multi-agent motion prediction,” inProceedings of the International Conference on Learning Representations, 2022

2022
[16]

Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,

B. Varadarajan, A. Hefny, A. Srivastava,et al., “Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,” inProceedings of the International Conference on Robotics and Au- tomation, 2022, pp. 7814–7821

2022
[17]

Scene Transformer: A unified architecture for predicting future trajectories of multiple agents,

J. Ngiam, V . Vasudevan, B. Caine,et al., “Scene Transformer: A unified architecture for predicting future trajectories of multiple agents,” inProceedings of the International Conference on Learning Represen- tations, 2022

2022
[18]

Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,

J. Cheng, X. Mei, and M. Liu, “Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8679–8689

2023
[19]

Wayformer: Motion forecasting via simple & efficient attention networks,

N. Nayakanti, R. Al-Rfou, A. Zhou,et al., “Wayformer: Motion forecasting via simple & efficient attention networks,” inProceedings of the IEEE International Conference on Robotics and Automation, 2023, pp. 2980–2987

2023
[20]

Predicting long-term human behaviors in discrete representations via physics-guided diffusion,

Z. Zhang, A. Li, A. Lim, and M. Chen, “Predicting long-term human behaviors in discrete representations via physics-guided diffusion,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024, pp. 11500–11507

2024
[21]

Trajectory forecasting through low-rank adaptation of discrete latent codes,

R. Benaglia, A. Porrello, P. Buzzega,et al., “Trajectory forecasting through low-rank adaptation of discrete latent codes,” inProceedings of the International Conference on Pattern Recognition, 2024, pp. 236– 251

2024
[22]

NSVQ: Noise substitution in vector quantization for machine learning,

M. H. Vali, and T. B ¨ackstr¨om, “NSVQ: Noise substitution in vector quantization for machine learning,”IEEE Access, vol. 10, pp. 13598– 13610, 2022

2022
[23]

Finite scalar quantization: VQ-V AE made simple,

F. Mentzer, D. Minnen, E. Agustsson, and M. Tschannen, “Finite scalar quantization: VQ-V AE made simple,” inProceedings of the International Conference on Learning Representations, 2024

2024
[24]

Implicit latent variable model for scene-consistent motion forecasting,

S. Casas, C. Gulino, S. Suo,et al., “Implicit latent variable model for scene-consistent motion forecasting,” inProceedings of the European Conference on Computer Vision, 2020, pp. 624–641

2020
[25]

PointNet: Deep learning on point sets for 3D classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660

2017
[26]

Efficient motion prediction: A lightweight & accurate trajectory prediction model with fast training and inference speed,

A. Prutsch, H. Bischof, and H. Possegger, “Efficient motion prediction: A lightweight & accurate trajectory prediction model with fast training and inference speed,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024, pp. 9411–9417

2024
[27]

CRITERIA: A new benchmarking paradigm for evaluating trajectory prediction models for autonomous driving,

C. Chen, M. Pourkeshavarz, and A. Rasouli, “CRITERIA: A new benchmarking paradigm for evaluating trajectory prediction models for autonomous driving,” inProceedings of the IEEE International Conference on Robotics and Automation, 2024, pp. 8265–8271

2024
[28]

Diverse and admissible trajectory forecasting through multimodal context understanding,

S. H. Park, G. Lee, J. Seo,et al., “Diverse and admissible trajectory forecasting through multimodal context understanding,” inProceedings of the European Conference on Computer Vision, 2020, pp. 282–298

2020
[29]

DLow: Diversifying latent flows for diverse human motion prediction,

Y . Yuan, and K. Kitani, “DLow: Diversifying latent flows for diverse human motion prediction,” inProceedings of the European Conference on Computer Vision, 2020, 346–364

2020
[30]

Likelihood- based diverse sampling for trajectory forecasting,

Y . J. Ma, J. P. Inala, D. Jayaraman, and O. Bastani, “Likelihood- based diverse sampling for trajectory forecasting,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13279–13288

2021
[31]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis,et al., “LoRA: Low-rank adaptation of large language models,” inProceedings of the International Conference on Learning Representations, 2022

2022

[1] [1]

Motion transformer with global intention localization and local movement refinement,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,”Advances in Neural Information Processing Systems, vol. 35, pp. 6531–6543, 2022

2022

[2] [2]

VectorNet: Encoding HD maps and agent dynamics from vectorized representation,

J. Gao, C. Sun, H. Zhao,et al., “VectorNet: Encoding HD maps and agent dynamics from vectorized representation,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2020, pp. 11525–11533

2020

[3] [3]

Desire: Distant future prediction in dynamic scenes with interacting agents,

N. Lee, W. Choi, P. Vernaza,et al., “Desire: Distant future prediction in dynamic scenes with interacting agents,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 336– 345

2017

[4] [4]

Social GAN: Socially accept- able trajectories with generative adversarial networks,

A. Gupta, J. Johnson, L. Fei-Fei,et al., “Social GAN: Socially accept- able trajectories with generative adversarial networks,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 2255–2264

2018

[5] [5]

Trajec- tron++: Dynamically-feasible trajectory forecasting with heterogeneous data,

T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajec- tron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” inProceedings of the European Conference on Computer Vision, 2020, pp. 683–700

2020

[6] [6]

ChauffeurNet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “ChauffeurNet: Learning to drive by imitating the best and synthesizing the worst,” inProceedings of the Robotics: Science and Systems, 2019

2019

[7] [7]

Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,

Y . Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction,” in Proceedings of the Conference on Robot Learning, 2020, pp. 86–99

2020

[8] [8]

CoverNet: Multi- modal behavior prediction using trajectory set,

T. Phan-Minh, E. C. Grigore, F. A. Boulton,et al., “CoverNet: Multi- modal behavior prediction using trajectory set,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2020, pp. 14074–14083

2020

[9] [9]

TNT: Target-driven trajectory predic- tion,

H. Zhao, J. Gao, T. Lan,et al., “TNT: Target-driven trajectory predic- tion,” inProceedings of the Conference on Robot Learning, 2021, pp. 895–904

2021

[10] [10]

DenseTNT: End-to-end trajectory prediction from dense goal sets,

J. Gu, C. Sun, and H. Zhao, “DenseTNT: End-to-end trajectory prediction from dense goal sets,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15303–15312

2021

[11] [11]

MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention query- ing,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention query- ing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3955–3971, 2024

2024

[12] [12]

Neural discrete representation learning,

A. Van Den Oord and O. Vinyals, “Neural discrete representation learning,”Advances in Neural Information Processing Systems, vol. 30, 2017

2017

[13] [13]

Unitraj: A unified framework for scalable vehicle trajectory prediction,

L. Feng, M. Bahari, K. M. B. Amor,et al., “Unitraj: A unified framework for scalable vehicle trajectory prediction,” inProceedings of the European Conference on Computer Vision, 2024, pp. 106–123

2024

[14] [14]

Argoverse 2: Next generation datasets for self-driving perception and forecasting,

B. Wilson, W. Qi, T. Agarwal,et al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,” inProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

2021

[15] [15]

Latent variable sequential set transformers for joint multi-agent motion prediction,

R. Girgis, F. Golemo, F. Codevilla,et al., “Latent variable sequential set transformers for joint multi-agent motion prediction,” inProceedings of the International Conference on Learning Representations, 2022

2022

[16] [16]

Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,

B. Varadarajan, A. Hefny, A. Srivastava,et al., “Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,” inProceedings of the International Conference on Robotics and Au- tomation, 2022, pp. 7814–7821

2022

[17] [17]

Scene Transformer: A unified architecture for predicting future trajectories of multiple agents,

J. Ngiam, V . Vasudevan, B. Caine,et al., “Scene Transformer: A unified architecture for predicting future trajectories of multiple agents,” inProceedings of the International Conference on Learning Represen- tations, 2022

2022

[18] [18]

Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,

J. Cheng, X. Mei, and M. Liu, “Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8679–8689

2023

[19] [19]

Wayformer: Motion forecasting via simple & efficient attention networks,

N. Nayakanti, R. Al-Rfou, A. Zhou,et al., “Wayformer: Motion forecasting via simple & efficient attention networks,” inProceedings of the IEEE International Conference on Robotics and Automation, 2023, pp. 2980–2987

2023

[20] [20]

Predicting long-term human behaviors in discrete representations via physics-guided diffusion,

Z. Zhang, A. Li, A. Lim, and M. Chen, “Predicting long-term human behaviors in discrete representations via physics-guided diffusion,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024, pp. 11500–11507

2024

[21] [21]

Trajectory forecasting through low-rank adaptation of discrete latent codes,

R. Benaglia, A. Porrello, P. Buzzega,et al., “Trajectory forecasting through low-rank adaptation of discrete latent codes,” inProceedings of the International Conference on Pattern Recognition, 2024, pp. 236– 251

2024

[22] [22]

NSVQ: Noise substitution in vector quantization for machine learning,

M. H. Vali, and T. B ¨ackstr¨om, “NSVQ: Noise substitution in vector quantization for machine learning,”IEEE Access, vol. 10, pp. 13598– 13610, 2022

2022

[23] [23]

Finite scalar quantization: VQ-V AE made simple,

F. Mentzer, D. Minnen, E. Agustsson, and M. Tschannen, “Finite scalar quantization: VQ-V AE made simple,” inProceedings of the International Conference on Learning Representations, 2024

2024

[24] [24]

Implicit latent variable model for scene-consistent motion forecasting,

S. Casas, C. Gulino, S. Suo,et al., “Implicit latent variable model for scene-consistent motion forecasting,” inProceedings of the European Conference on Computer Vision, 2020, pp. 624–641

2020

[25] [25]

PointNet: Deep learning on point sets for 3D classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660

2017

[26] [26]

Efficient motion prediction: A lightweight & accurate trajectory prediction model with fast training and inference speed,

A. Prutsch, H. Bischof, and H. Possegger, “Efficient motion prediction: A lightweight & accurate trajectory prediction model with fast training and inference speed,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024, pp. 9411–9417

2024

[27] [27]

CRITERIA: A new benchmarking paradigm for evaluating trajectory prediction models for autonomous driving,

C. Chen, M. Pourkeshavarz, and A. Rasouli, “CRITERIA: A new benchmarking paradigm for evaluating trajectory prediction models for autonomous driving,” inProceedings of the IEEE International Conference on Robotics and Automation, 2024, pp. 8265–8271

2024

[28] [28]

Diverse and admissible trajectory forecasting through multimodal context understanding,

S. H. Park, G. Lee, J. Seo,et al., “Diverse and admissible trajectory forecasting through multimodal context understanding,” inProceedings of the European Conference on Computer Vision, 2020, pp. 282–298

2020

[29] [29]

DLow: Diversifying latent flows for diverse human motion prediction,

Y . Yuan, and K. Kitani, “DLow: Diversifying latent flows for diverse human motion prediction,” inProceedings of the European Conference on Computer Vision, 2020, 346–364

2020

[30] [30]

Likelihood- based diverse sampling for trajectory forecasting,

Y . J. Ma, J. P. Inala, D. Jayaraman, and O. Bastani, “Likelihood- based diverse sampling for trajectory forecasting,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13279–13288

2021

[31] [31]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis,et al., “LoRA: Low-rank adaptation of large language models,” inProceedings of the International Conference on Learning Representations, 2022

2022