pith. sign in

arxiv: 2604.24514 · v2 · pith:S6PFNZRGnew · submitted 2026-04-27 · 💻 cs.LG

SceneSelect: Selective Learning for Trajectory Scene Classification and Expert Scheduling

Pith reviewed 2026-05-22 09:55 UTC · model grok-4.3

classification 💻 cs.LG
keywords trajectory predictionscene classificationexpert schedulingselective learningunsupervised clusteringmotion forecastingheterogeneous environments
0
0 comments X

The pith

SceneSelect classifies scenes via unsupervised clustering on geometric and kinematic features to route each trajectory to its best expert predictor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Trajectory prediction is hard because real scenes differ sharply in speed, density, and interaction style, yet most systems still train one model to cover everything. The paper claims that first discovering natural scene types through clustering on simple geometric and motion features, then routing each new input to a specialized expert, closes the resulting accuracy gap. A lightweight classifier labels the scene on the fly and a scheduling rule hands the sequence to the matching predictor. Because the classifier and experts stay separate, users can plug in new models or switch datasets without retraining the entire system. Experiments on ETH-UCY, SDD, and NBA show consistent gains over single models and ensembles.

Core claim

SceneSelect uses unsupervised clustering on interpretable geometric and kinematic features to discover a latent scene taxonomy whose categories align with distinct optimal expert predictors. A highly decoupled classification module assigns real-time inputs to these categories, while an extensible plug-and-play scheduling policy dispatches each trajectory sequence to the best expert. This decoupled design supports integration with off-the-shelf models and adaptation to new datasets without computationally expensive joint retraining.

What carries the argument

Unsupervised clustering on geometric and kinematic features that discovers a latent scene taxonomy, combined with a decoupled classification module and extensible scheduling policy that routes inputs to the optimal expert predictor.

If this is right

  • A single unified model leaves a generalization gap when scene heterogeneity is high.
  • Routing to scene-specific experts reduces prediction error and avoids computation on mismatched models.
  • The decoupled classifier and scheduler allow new predictors to be added or datasets changed without joint retraining.
  • The approach delivers an average 10.5 percent improvement on ETH-UCY, SDD, and NBA benchmarks over strong baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same scene-taxonomy idea could be tested on other prediction tasks that suffer from environment variation, such as traffic flow forecasting under different road layouts.
  • If the clustering features miss key interaction patterns, performance on densely populated multi-agent scenes may fall short of the reported gains.
  • The discovered taxonomy might transfer across tasks if the geometric features capture fundamental motion regimes rather than dataset-specific details.

Load-bearing premise

Clustering scenes by geometric and kinematic features produces groups in which each group has one clearly superior expert model.

What would settle it

Testing the full pipeline on a dataset whose scenes fall outside the original clusters and finding that routed experts no longer outperform a single model trained on all data.

Figures

Figures reproduced from arXiv: 2604.24514 by Deshun Xia, Weijie Zhu, Xinrun Wang, Yuxi Sun.

Figure 1
Figure 1. Figure 1: Empirical analysis on ETH-UCY. (a) Accuracy vs. efficiency trade-off across varying degrees of scene heterogeneity: lightweight models excel in sparse scenes, while Transformers dominate complex ones. (b) Target trajectories naturally partition into distinct scene clusters via PCA based on motion velocity, spatial density, and interaction patterns, revealing systematic scene heterogeneity. Empirically, thi… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison between SOTA single model and SceneSelect. (a) ADE and (b) FDE on Dense Crowd, Sparse Static, and Dynamic Complex scenes. SceneSelect (green) outperforms single model (red) via adaptive expert selection, achieving 54% and 58% improvements. We propose selection learning: instead of refining architectures, we learn when and which predictor to use, decoupling scene understanding from fo… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our Selective Learning framework. Top: The end-to-end pipeline processing augmented tra￾jectories (via scaling, rotation, translation) through the Scene Context Encoder and SceneSelect to route inputs to the optimal model. Bottom: Detailed modules. (a) Scene Context Encoder translates raw features into a high￾dimensional sparse matrix via random projection and sparsification, grouped into pseud… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative trajectory prediction comparison on ETH-UCY dataset across five scenes (Eth, Hotel, Zara01, Zara02, Univ). Each row shows predictions from a different method (Ours, SocialMOIF, Certified HTP, Singu￾larTraj). Heat maps visualize prediction probability distributions (blue = low, red = high). Blue circles indicate observed trajectories, red circles mark best predictions, green circles show ground … view at source ↗
read the original abstract

Accurate trajectory prediction is fundamentally challenging due to high scene heterogeneity - the severe variance in motion velocity, spatial density, and interaction patterns across different real-world environments. However, most existing approaches typically train a single unified model, expecting a fixed-capacity architecture to generalize universally across all possible scenarios. This conventional model-centric paradigm is fundamentally flawed when confronting such extreme heterogeneity, inevitably leading to a severe generalization gap, degraded accuracy, and massive computational waste. To overcome this bottleneck, rather than refining restricted model-centric architectures, we propose selective learning, a novel scene-centric paradigm. It explicitly analyzes the characteristics of the underlying scene to dynamically route inputs to the most appropriate expert models. As a concrete implementation of this paradigm, we introduce SceneSelect. Specifically, SceneSelect utilizes unsupervised clustering on interpretable geometric and kinematic features to discover a latent scene taxonomy. A highly decoupled classification module is then trained to assign real-time inputs to these scene categories, and a highly extensible, plug-and-play scheduling policy automatically dispatches the trajectory sequence to the optimal expert predictor. Crucially, this decoupled design ensures excellent generalization capabilities, allowing seamless integration with different off-the-shelf models and robust adaptation across new datasets without requiring computationally expensive joint retraining. Extensive experiments on three public benchmarks (ETH-UCY, SDD, and NBA) demonstrate that our method consistently outperforms strong single-model and ensemble baselines, achieving an average improvement of 10.5%, showcasing the effectiveness of scene-aware selective learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes SceneSelect, a scene-centric selective learning framework for trajectory prediction that addresses scene heterogeneity via unsupervised clustering on geometric and kinematic features to discover a latent scene taxonomy. A decoupled classification module assigns inputs to these categories in real time, and a scheduling policy routes each trajectory to the most appropriate expert predictor. The design is presented as extensible and generalizable without joint retraining. Experiments on ETH-UCY, SDD, and NBA benchmarks are claimed to show consistent outperformance over single-model and ensemble baselines with an average 10.5% improvement.

Significance. If the empirical results hold and the unsupervised clusters demonstrably align with distinct optimal experts, the work would provide a practical alternative to monolithic models for handling high-variance real-world scenes in trajectory forecasting. The decoupled, plug-and-play architecture is a clear strength, enabling integration with off-the-shelf predictors and adaptation across datasets without expensive retraining.

major comments (2)
  1. [Abstract] Abstract: the asserted 10.5% average improvement is stated without any quantitative details on clustering stability, scene-classification accuracy, error bars, baseline implementations, or statistical significance tests. Full methods and results sections must be examined to determine whether the data actually support that scene-aware selection, rather than added capacity or ensembling effects, drives the gains.
  2. [Methods / Clustering and Scheduling] The central claim requires that the discovered scene taxonomy partitions the data such that each category exhibits a measurably different best expert. No analysis is supplied showing per-cluster expert performance gaps, consistency of optimal-expert assignments, or ablation results that isolate the contribution of scene-aware routing versus simple ensembling.
minor comments (1)
  1. [Abstract] The repeated use of 'highly decoupled' and 'highly extensible' would benefit from precise definitions of the training objectives and interface contracts between the clustering, classification, and scheduling modules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript on SceneSelect. We address each major comment point by point below, drawing from the full manuscript while indicating specific revisions that will be incorporated to strengthen the presentation and supporting evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the asserted 10.5% average improvement is stated without any quantitative details on clustering stability, scene-classification accuracy, error bars, baseline implementations, or statistical significance tests. Full methods and results sections must be examined to determine whether the data actually support that scene-aware selection, rather than added capacity or ensembling effects, drives the gains.

    Authors: The full manuscript details the experimental protocol, baseline implementations, and results across ETH-UCY, SDD, and NBA. To improve transparency as requested, we will revise the abstract and results section to explicitly report clustering stability metrics, scene-classification accuracy, error bars from multiple runs, and statistical significance tests. We will additionally include an ablation comparing SceneSelect against a non-routed ensemble of all experts to isolate the contribution of scene-aware routing from capacity or ensembling effects. revision: yes

  2. Referee: [Methods / Clustering and Scheduling] The central claim requires that the discovered scene taxonomy partitions the data such that each category exhibits a measurably different best expert. No analysis is supplied showing per-cluster expert performance gaps, consistency of optimal-expert assignments, or ablation results that isolate the contribution of scene-aware routing versus simple ensembling.

    Authors: The manuscript describes unsupervised clustering on geometric and kinematic features followed by per-cluster expert training and scheduling based on validation performance. We agree that explicit supporting analyses are needed. In the revision we will add per-cluster performance tables and visualizations demonstrating expert gaps and optimal assignments per category, consistency checks across clustering runs or data folds, and ablations that directly contrast selective routing against simple ensembling or random dispatching. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core derivation proceeds from unsupervised clustering on geometric and kinematic features to discover scene categories, followed by independent training of a decoupled classification module and a plug-and-play scheduling policy that routes to off-the-shelf expert predictors. These stages are described as separate and extensible without any equations or definitions that set the final performance metric (e.g., ADE/FDE gains) equal to the clustering or scheduling inputs by construction. Reported improvements on ETH-UCY, SDD, and NBA benchmarks are presented as empirical outcomes rather than forced by the method's setup, and no self-citations, uniqueness theorems, or ansatzes are invoked to justify load-bearing choices. The approach remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that geometric and kinematic features suffice to reveal meaningful scene categories and that expert specialization yields additive gains without retraining. No free parameters or invented entities are quantified in the abstract.

axioms (1)
  • domain assumption High scene heterogeneity causes severe generalization gaps in single unified trajectory models
    Stated as the fundamental flaw of the conventional model-centric paradigm in the opening sentences of the abstract.

pith-pipeline@v0.9.0 · 5796 in / 1280 out tokens · 46148 ms · 2026-05-22T09:55:09.074099+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: Human trajec- tory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 961–971 (2016)

  2. [2]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Bae, I., Park, Y.J., Jeon, H.G.: Singulartrajectory: Universal trajectory predictor using diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17890–17901 (2024)

  3. [3]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Bahari, M., Saadatnejad, S., Farsangi, A.A., Moosavi-Dezfooli, S.M., Alahi, A.: Certified human trajectory prediction. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 12301–12311 (2025)

  4. [4]

    In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Best, G., Fitch, R.: Bayesian intention inference for trajectory prediction with an unknown goal destination. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 5817–5823. IEEE (2015)

  5. [5]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020)

  6. [6]

    IEEE Transactions on Circuits and Systems for Video Tech- nology 31(5), 1764–1775 (2020) SceneSelect: Selective Learning for Trajectory Prediction 11

    Chen, K., Song, X., Ren, X.: Pedestrian trajectory prediction in heterogeneous traffic using pose keypoints- based convolutional encoder-decoder network. IEEE Transactions on Circuits and Systems for Video Tech- nology 31(5), 1764–1775 (2020) SceneSelect: Selective Learning for Trajectory Prediction 11

  7. [7]

    IEEE Transactions on Intelligent Transportation Systems 23(11), 20046–20060 (2022)

    Chen, K., Song, X., Yuan, H., Ren, X.: Fully convolutional encoder-decoder with an attention mechanism for practical pedestrian trajectory prediction. IEEE Transactions on Intelligent Transportation Systems 23(11), 20046–20060 (2022)

  8. [8]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Chen, K., Zhao, X., Huang, Y., Fang, G., Song, X., Wang, R., Wang, Z.: Socialmoif: Multi-order intention fusion for pedestrian trajectory prediction. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22465–22475 (2025)

  9. [9]

    Image and Vision Computing 134, 104671 (2023)

    Chen, K., Zhu, H., Tang, D., Zheng, K.: Future pedestrian location prediction in first-person videos for autonomous vehicles and social robots. Image and Vision Computing 134, 104671 (2023)

  10. [10]

    Advanced Engineering Informatics 69, 103798 (2026)

    Cheng, F., Liu, H., Lv, X.: Metagnsdformer: Meta-learning enhanced gated non-stationary informer with frequency-aware attention for point-interval remaining useful life prediction of lithium-ion batteries. Advanced Engineering Informatics 69, 103798 (2026)

  11. [11]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: Msr-gcn: Multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11467–11476 (2021)

  12. [12]

    In: International Conference on Learning Representations (2022), https://openreview.net/forum?id=Dup_dDqkZC5

    Girgis, R., Golemo, F., Codevilla, F., Weiss, M., D’Souza, J.A., Kahou, S.E., Heide, F., Pal, C.: Latent variable sequential set transformers for joint multi-agent motion prediction. In: International Conference on Learning Representations (2022), https://openreview.net/forum?id=Dup_dDqkZC5

  13. [13]

    In: 2020 25th international conference on pattern recognition (ICPR)

    Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th international conference on pattern recognition (ICPR). pp. 10335–10342. IEEE (2021)

  14. [14]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeter- minacy diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17113–17122 (2022)

  15. [15]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social gan: Socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2255–2264 (2018)

  16. [16]

    Optimal control applications and methods 24(3), 153–172 (2003)

    Hoogendoorn, S., HL Bovy, P.: Simulation of pedestrian flows by optimal control and differential games. Optimal control applications and methods 24(3), 153–172 (2003)

  17. [17]

    Transportation research record 2326(1), 45–53 (2013)

    Hoogendoorn, S., Daamen, W., Shu, Y., Ligteringen, H.: Modeling human behavior in vessel maneuver simulation by optimal control and game theory. Transportation research record 2326(1), 45–53 (2013)

  18. [18]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning- oriented autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17853–17862 (2023)

  19. [19]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Kim, S., Chi, H.g., Lim, H., Ramani, K., Kim, J., Kim, S.: Higher-order relational reasoning for pedes- trian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15251–15260 (2024)

  20. [20]

    Proceedings of the IEEE 111(1), 19–41 (2022)

    Kyrkou, C., Kolios, P., Theocharides, T., Polycarpou, M.: Machine learning for emergency management: A survey and future outlook. Proceedings of the IEEE 111(1), 19–41 (2022)

  21. [21]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Li, L., Pagnucco, M., Song, Y.: Graph-based spatial transformer with memory replay for multi-future pedes- trian trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2231–2241 (2022)

  22. [22]

    IEEE Transactions on Circuits and Systems for Video Technology 34(12), 12880–12893 (2024)

    Li, L., Lin, X., Huang, Y., Zhang, Z., Hu, J.F.: Beyond minimum-of-n: Rethinking the evaluation and methods of pedestrian trajectory prediction. IEEE Transactions on Circuits and Systems for Video Technology 34(12), 12880–12893 (2024)

  23. [23]

    Pattern Recognition 158, 110978 (2025)

    Li, Y., Sun, T., Shao, Z., Zhen, Y., Xu, Y., Wang, F.: Trajectory-user linking via multi-scale graph attention network. Pattern Recognition 158, 110978 (2025)

  24. [24]

    In: 2021 IEEE International Conference on Robotics and Automation (ICRA)

    Liu, C., Chen, Y., Liu, M., Shi, B.E.: A vgcn: Trajectory prediction using graph convolutional networks guided by human attention. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). pp. 14234–14240. IEEE (2021)

  25. [25]

    In: European conference on computer vision

    Mangalam, K., Girase, H., Agarwal, S., Lee, K.H., Adeli, E., Malik, J., Gaidon, A.: It is not the journey but the destination: Endpoint conditioned trajectory prediction. In: European conference on computer vision. pp. 759–776. Springer (2020)

  26. [26]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Mao, W., Xu, C., Zhu, Q., Chen, S., Wang, Y.: Leapfrog diffusion model for stochastic trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5517–5526 (2023)

  27. [27]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Marchetti, F., Becattini, F., Seidenari, L., Bimbo, A.D.: Mantra: Memory augmented networks for mul- tiple trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7143–7152 (2020)

  28. [28]

    Advances in Neural Information Processing Systems 35, 24920–24933 (2022)

    Meng, M., Wu, Z., Chen, T., Cai, X., Zhou, X., Yang, F., Shen, D.: Forecasting human trajectory from scene history. Advances in Neural Information Processing Systems 35, 24920–24933 (2022)

  29. [29]

    In: 2020 25th international conference on pattern recognition (ICPR)

    Monti, A., Bertugli, A., Calderara, S., Cucchiara, R.: Dag-net: Double attentive graph neural network for trajectory forecasting. In: 2020 25th international conference on pattern recognition (ICPR). pp. 2551–2558. IEEE (2021)

  30. [30]

    In: European conference on computer vision

    Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In: European conference on computer vision. pp. 683–700. Springer (2020) 12 Xinrun Wang, Deshun Xia, Ke Xu, and Weijie Zhu

  31. [31]

    IEEE Transactions on Intelligent Transportation Systems 22(6), 3285–3302 (2020)

    Song, X., Chen, K., Li, X., Sun, J., Hou, B., Cui, Y., Zhang, B., Xiong, G., Wang, Z.: Pedestrian trajec- tory prediction based on deep convolutional lstm network. IEEE Transactions on Intelligent Transportation Systems 22(6), 3285–3302 (2020)

  32. [32]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Sun, J., Li, Y., Chai, L., Fang, H.S., Li, Y.L., Lu, C.: Human trajectory prediction with momentary ob- servation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6467–6476 (2022)

  33. [33]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Xu, C., Li, M., Ni, Z., Zhang, Y., Chen, S.: Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6498–6507 (2022)

  34. [34]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Xu, C., Tan, R.T., Tan, Y., Chen, S., Wang, Y.G., Wang, X., Wang, Y.: Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1410–1420 (2023)

  35. [35]

    In: European Conference on Computer Vision

    Xu, P., Hayet, J.B., Karamouzas, I.: Socialvae: Human trajectory prediction using timewise latents. In: European Conference on Computer Vision. pp. 511–528. Springer (2022)

  36. [36]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yang, J., Zhu, H., Wang, Y., Wu, G., He, T., Wang, L.: Tra-moe: Learning trajectory prediction model from multiple domains for adaptive policy conditioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6960–6970 (2025)

  37. [37]

    DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

    Yang, Z., Chai, Y., Jia, X., Li, Q., Shao, Y., Zhu, X., Su, H., Yan, J.: Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving. arXiv preprint arXiv:2505.16278 (2025)

  38. [38]

    In: IJCAI

    Yin, Z., Liu, R., Xiong, Z., Yuan, Z.: Multimodal transformer networks for pedestrian trajectory prediction. In: IJCAI. pp. 1259–1265 (2021)

  39. [39]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Yuan, Y., Weng, X., Ou, Y., Kitani, K.M.: Agentformer: Agent-aware transformers for socio-temporal multi- agent forecasting. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9813– 9823 (2021)

  40. [40]

    In: European con- ference on computer vision

    Yue, J., Manocha, D., Wang, H.: Human trajectory prediction via neural social physics. In: European con- ference on computer vision. pp. 376–394. Springer (2022)

  41. [41]

    In: 2014 IEEE international conference on data mining

    Yue, Y., Lucey, P., Carr, P., Bialkowski, A., Matthews, I.: Learning fine-grained spatial models for dynamic sports play prediction. In: 2014 IEEE international conference on data mining. pp. 670–679. IEEE (2014)

  42. [42]

    Expert Systems with Applications 301, 130474 (2026)

    Zhu, W., Xie, L., Fu, H., Zhang, J.: Ghost: Sentiment-gated mamba and stock-wise tokenization attention for enhanced stock prediction. Expert Systems with Applications 301, 130474 (2026). https://doi.org/10. 1016/j.eswa.2025.130474