Mind the Noise: Sensitivity of Transformer-based Interaction-Aware Trajectory Prediction Models to Noisy Data
Pith reviewed 2026-06-26 14:24 UTC · model grok-4.3
The pith
Noisy real-world data reduces Transformer trajectory prediction accuracy by up to 3.9 times.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
State-of-the-art Transformer-based interaction-aware trajectory prediction models, which rely on attention to model multi-agent interactions, exhibit rapid deterioration in prediction accuracy when supplied with noisy object state information that reflects real perception uncertainties and V2X localization errors.
What carries the argument
Attention mechanisms inside the Transformer that encode interactions among agents, which become unreliable once input states contain additive noise.
If this is right
- Prediction error grows steadily with noise intensity rather than remaining stable up to a threshold.
- Even modest noise levels already multiply error by 1.3, while the upper end of realistic noise multiplies it by 3.9.
- Current clean training and evaluation sets do not expose the models to the conditions they will meet in deployment.
- Explicit noise-mitigation techniques or uncertainty-aware inputs become necessary for reliable performance.
Where Pith is reading between the lines
- Adding explicit noise during training could improve robustness without changing model architecture.
- The same sensitivity is likely to appear in other attention-based or graph-based predictors used for motion forecasting.
- Field trials that log both predicted trajectories and ground-truth positions under live V2X conditions would provide a direct test of the reported degradation factors.
Load-bearing premise
The noise models and intensity ranges tested in the experiments match the actual uncertainties that occur in vehicle perception systems and V2X communications.
What would settle it
Running the same model on a dataset collected from actual V2X-equipped vehicles where measured localization and perception errors are recorded, then checking whether accuracy drops match the factors reported in the controlled experiments.
Figures
read the original abstract
Trajectory prediction allows autonomous vehicles to anticipate the future behavior of surrounding objects (or agents) and, accordingly, maximize the safety and efficiency of their driving. State-of-the-art Transformed-based interaction-aware trajectory prediction models, which rely on attention mechanisms to capture multi-agent interactions and maximize prediction accuracy, are commonly trained and evaluated on long-range high-quality datasets. These datasets are typically obtained by aggregating data from multiple vehicles or drones and removing any object detection or tracking noise offline. Yet, information about a surrounding object's state (its position, speed, heading) is far from being noiseless in real-world deployments. Object state estimation is affected by perception uncertainties and localization errors that can be particularly large for objects received via Vehicle-to-Everything (V2X) communications. In this paper, we analyze the impact of noisy object state information on the trajectory prediction accuracy of a state-of-the-art Transformer-based interaction-aware trajectory prediction model. Our study demonstrates that trajectory prediction accuracy can rapidly deteriorate as the noise intensity increases. Numerical results show that the prediction accuracy can reduce by a 1.3x factor under small noise levels and by as much as a 3.9x factor under the highest (yet realistic) noise conditions. These findings reveal the strong sensitivity of trajectory prediction models to noisy data, underscoring the need for more realistic training and evaluation datasets as well as noise mitigation strategies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical analysis of the sensitivity of state-of-the-art Transformer-based interaction-aware trajectory prediction models to additive noise in object state inputs (position, speed, heading). It reports that prediction accuracy degrades by a factor of 1.3 under small noise levels and up to 3.9 under the highest tested (claimed realistic) noise conditions, based on experiments with noisy data, and concludes that current models trained on clean aggregated datasets are brittle and that more realistic training/evaluation protocols plus noise mitigation are required.
Significance. If the reported degradation factors prove reproducible and the injected noise distributions are shown to match measured V2X/perception error statistics, the work would usefully document a practical limitation of attention-based predictors and motivate noise-aware training regimes. The absence of any machine-checked proofs or parameter-free derivations is expected for an empirical sensitivity study; the value would lie in the falsifiable numerical measurements themselves.
major comments (2)
- [Abstract] Abstract: The central numerical claims (1.3x and 3.9x accuracy reduction) are presented without any accompanying description of the exact noise distributions, variances, correlation structure, dataset splits, error metrics (ADE/FDE), statistical significance tests, or baseline comparisons. This prevents verification of the reported degradation factors from the provided text.
- [Abstract] Abstract: The statement that the highest noise conditions are “yet realistic” is unsupported by any citation to empirical V2X localization error covariances, camera/radar tracking statistics, or field measurements; without such grounding the 1.3x–3.9x factors cannot be interpreted as representative of deployment conditions.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We agree that the abstract would benefit from additional details to support the reported degradation factors and the realism claim. We will revise the abstract in the next version to address these points while keeping it concise. Below we respond to each major comment.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central numerical claims (1.3x and 3.9x accuracy reduction) are presented without any accompanying description of the exact noise distributions, variances, correlation structure, dataset splits, error metrics (ADE/FDE), statistical significance tests, or baseline comparisons. This prevents verification of the reported degradation factors from the provided text.
Authors: We agree the abstract, as a summary, omits these details. The full manuscript specifies the noise model (additive independent Gaussian perturbations to position, speed, and heading with variances scaled to match typical sensor uncertainties), the evaluation metrics (ADE and FDE), dataset splits, baseline comparisons, and reports statistical significance where applicable. To make the central claims verifiable from the abstract alone, we will revise it to briefly note the noise model, metrics used, and that full experimental details appear in Sections 3 and 4. This change will be incorporated. revision: yes
-
Referee: [Abstract] Abstract: The statement that the highest noise conditions are “yet realistic” is unsupported by any citation to empirical V2X localization error covariances, camera/radar tracking statistics, or field measurements; without such grounding the 1.3x–3.9x factors cannot be interpreted as representative of deployment conditions.
Authors: The noise levels were selected to reflect reported ranges of V2X and perception errors discussed in the introduction and related-work sections of the full paper. We acknowledge that the abstract itself provides no citations for this claim. We will revise the abstract to include a brief supporting reference to representative V2X localization error statistics (e.g., position errors on the order of several meters under urban conditions) and add the corresponding citations. This will allow the degradation factors to be interpreted against deployment-relevant conditions. revision: yes
Circularity Check
No circularity: purely empirical sensitivity study
full rationale
The paper conducts an empirical analysis by injecting noise into input data and measuring resulting prediction accuracy drops on a Transformer model. No derivations, equations, fitted parameters presented as predictions, or self-citation chains are present. All reported factors (1.3x–3.9x) are direct experimental measurements, not outputs of a closed model or self-referential construction. The noise model grounding concern is a validity issue, not circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- noise intensity levels
axioms (1)
- domain assumption The selected Transformer model and dataset are representative of state-of-the-art interaction-aware trajectory predictors.
Reference graph
Works this paper leans on
-
[1]
and the INTERACTION dataset [8], respectively. These datasets rely on offline post-processing pipelines to: (i) aggregate observations from multiple viewpoints and collect a larger number of surrounding objects, including objects beyond the line-of-sight range of the ego-vehicle’s onboard sensors; and (ii) remove detection and tracking noise to obtain hig...
-
[2]
It is worth highlighting that, in both Fig
The relative increase of all metrics ranges from 1.3x to 3.9x (observed in the minADE case). It is worth highlighting that, in both Fig. 2 and Fig. 3, minADE exhibits a larger relative increase than minFDE, across all V2X noise levels. This indicates that noisy object state information affects the overall shape and temporal consistency of predicted tra...
-
[3]
DeMo++: Motion Decoupling for Autonomous Driving,
B. Zhang et al., “DeMo++: Motion Decoupling for Autonomous Driving,” arXiv:2507.17342, July 2025
arXiv 2025
-
[4]
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction,
Lan, Z et al., “SEPT: Towards Efficient Scene Representation Learning for Motion Prediction,” in Proc. ICLR 2024, Vienna, Austria, 2024
2024
-
[5]
FutureNet-LoF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding,
M. Wang et al., “FutureNet-LoF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding,” in Proc. ICRA 2024, Yokohama, Japan, 2024, pp. 8841-8848
2024
-
[6]
Understanding World or Predicting Future? A Comprehensive Survey of World Models,
J. Ding et al., “Understanding World or Predicting Future? A Comprehensive Survey of World Models,” ACM Computing Surveys , vol. 58, no. 3, 2026, pp. 1-38
2026
-
[7]
Trajectory prediction for autonomous driving: Progress, limitations, and future directions,
N. A. Madjid et al., “Trajectory prediction for autonomous driving: Progress, limitations, and future directions,” Elsevier Information Fusion, vol. 126, Feb. 2026, pp. 1-59
2026
-
[8]
Graph neural networks for modelling traffic participant interaction,
F. Diehl et al., “Graph neural networks for modelling traffic participant interaction,” in Proc. IEEE IV 2019, Paris, France, 2019, pp. 695–701
2019
-
[9]
Argoverse 2: Next Generation Datasets for Self- Driving Perception and Forecasting,
B. Wilson et al., “Argoverse 2: Next Generation Datasets for Self- Driving Perception and Forecasting,” arXiv:2301.00493, Jan. 2023
Pith/arXiv arXiv 2023
-
[10]
W. Zhan et al., “INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps,” arXiv:1910.03088, Sep. 2019
arXiv 1910
-
[11]
A Survey on Trajectory-Prediction Methods for Autonomous Driving,
Y. Huang, et al., “A Survey on Trajectory-Prediction Methods for Autonomous Driving,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, Sep. 2022, pp. 652–674
2022
-
[12]
SIMPL: A Simple and Efficient Multi-Agent Motion Prediction Baseline for Autonomous Driving,
L. Zhang, P. Li, S. Liu, and S. Shen, “SIMPL: A Simple and Efficient Multi-Agent Motion Prediction Baseline for Autonomous Driving,” IEEE Robot. Autom. Lett., vol. 9, no. 4, Apr. 2024, pp. 3767–3774
2024
-
[13]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar et al., “nuscenes: A multimodal dataset for autonomous driving,” in Proc. IEEE/CVF CVPR 2020 , Seattle, WA, USA, June 2020, pp. 11621–11631
2020
-
[14]
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising,
H. Zhang et al., “OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising,” in Proc. IEEE/CVF CVPR 2024 , Seattle, WA, USA, Jun. 2024, pp. 14802–14811
2024
-
[15]
Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,
S. Ettinger et al., “Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset,” in Proc. IEEE/CVF ICCV 2021, Montreal, Canada, Oct. 2021, pp. 9710–9719
2021
-
[16]
One Thousand and One Hours: Self-driving Motion Prediction Dataset,
J. Houston et al., “One Thousand and One Hours: Self-driving Motion Prediction Dataset,” in Proc. CoRL 2020, Cambridge, MA, USA, Jan. 2021, pp. 409–418
2020
-
[17]
How the Fusion of Onboard Sensors and V2X Data can Improve (or not) the Cooperative Perception of Connected Automated Vehicles,
A. Mohammadisarab et al., “How the Fusion of Onboard Sensors and V2X Data can Improve (or not) the Cooperative Perception of Connected Automated Vehicles,” in Proc. VTC2025-Spring , Oslo, Norway, 2025, pp. 1-5
2025
-
[18]
Generation of Cooperative Perception Messages for Connected and Automated Vehicles,
G. Thandavarayan, et al., “Generation of Cooperative Perception Messages for Connected and Automated Vehicles,” IEEE Transactions on Vehicular Technology , vol. 69, no. 12, Dec. 2020, pp. 16336-16341
2020
-
[19]
Maximum non-bounded difference method for overbounding Global Navigation Satellite System errors,
M. M. S. Alghananim and W. Y. Ochieng, “Maximum non-bounded difference method for overbounding Global Navigation Satellite System errors,” GPS Solutions, vol. 29, no. 1, 2025, pp. 1–14. Fig. 3. Relative increase of the minADE, minFDE, and MR metrics with respect to the original noiseless scenario under increasing V2X noise levels. Fig. 2. Relative incr...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.