arxiv: 2605.06906 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: no theorem link

TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond

Shang-Ling Hsu , Mark Tenzer , Cyrus Shahabi , Khurram Shafique

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:55 UTC · model grok-4.3

classification 💻 cs.LG

keywords pre-trainingmobility modelingevent streamsanomaly detectionnext-visit predictiontransfer learningmulti-entity dataspatiotemporal events

0 comments

The pith

TraXion is a pre-training framework built to satisfy three axioms for multi-entity spatiotemporal event streams, allowing one checkpoint per dataset to outperform task-specific models on mobility tasks and transfer directly to security and

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing pre-training methods for mobility data treat trajectories like text sentences, which ignores three key structural differences: visits carry joint meaning over location time and activity, users have persistent signatures across trajectories, and visits are not independent because users share places. The paper formalizes these differences as three axioms that any pre-training framework for the broader class of multi-entity spatiotemporal event streams must meet. TraXion is designed with objectives and architecture that jointly satisfy the axioms. A single TraXion checkpoint per dataset beats task-specific baselines on every task across six public mobility datasets for anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction. The same unchanged recipe matches or exceeds prior results on enterprise authentication logs and ICU mortality prediction.

Core claim

TraXion is a pre-training framework whose objectives and architecture are jointly designed to satisfy three axioms derived from the properties of MESES data: tuple-valued events whose meaning depends on the joint distribution over location time and activity, persistent user signatures across trajectories, and non-independence across users due to co-location at shared places. A single TraXion checkpoint per dataset beats task-specific baselines on every task across six public mobility datasets covering anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction. The same recipe, applied unchanged to enterprise authentication logs and ICU mortality prediction,

What carries the argument

TraXion's objectives and architecture jointly designed to satisfy the three axioms for multi-entity spatiotemporal event streams (MESES).

If this is right

Mobility tasks such as anomaly detection and next-visit prediction can share one pre-trained representation instead of requiring separate models for each task.
Event streams from security and healthcare can be modeled with the same pre-training recipe as mobility data without any changes.
Performance improvements arise because the pre-training explicitly accounts for joint distributions over event attributes, persistent entity signatures, and inter-entity interactions.
Cross-domain transfer becomes possible once the pre-training respects the shared structural properties of MESES data rather than importing objectives from unrelated domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same axiomatic approach could extend to other MESES-like data such as financial transactions or sensor networks where entities interact through shared infrastructure.
If the axioms prove general, future models for sequential event data may need less domain-specific engineering because the core structure is captured at the pre-training stage.
Testing which of the three axioms contributes most to gains on particular tasks would clarify whether all three are required or if subsets suffice for narrower applications.

Load-bearing premise

The three structural properties of MESES data can be turned into axioms whose satisfaction by TraXion's objectives and architecture is both necessary and sufficient for the observed performance gains, with no domain-specific post-processing or hyper-parameter search required.

What would settle it

Training TraXion on a new MESES dataset from a different domain and finding that it underperforms a task-specific baseline on at least one task or requires domain-specific hyperparameter tuning to match prior results.

Figures

Figures reproduced from arXiv: 2605.06906 by Cyrus Shahabi, Khurram Shafique, Mark Tenzer, Shang-Ling Hsu.

**Figure 1.** Figure 1: Pre-training architecture of TraXion. A perturbation operator η flags a fraction of events in window Eu and corrupts their context, time, or both, yielding (E˜ u, yu). Each event becomes F=5 feature tokens—spatial (Space2Vec), two temporal (Time2Vec), the entity prototype pu, and activity—and is stacked with up to C−1 co-occurring peer events from other entities at the same substrate context x. The backbon… view at source ↗

**Figure 2.** Figure 2: UA-Berlin pre-training validation loss under three wrap settings (identical recipe, [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

read the original abstract

Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and visits are not independent across users, since co-location at shared places is a primary signal. Existing pre-training recipes for mobility import objectives from language modeling, treating trajectories as sentences and visits as tokens, an analogy that fails against each of the three properties above. These properties define a broader class, multi-entity spatiotemporal event streams (MESES), spanning enterprise authentication logs, electronic health records, and other event-stream domains where entities share infrastructure, schedules, or contexts. We make the properties precise as three axioms that any pre-training framework for MESES should satisfy, and introduce TraXion, whose objectives and architecture are jointly designed to meet them. A single TraXion checkpoint per dataset beats task-specific baselines on every task across six public mobility datasets covering anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction. The same recipe, applied unchanged to enterprise authentication logs and ICU mortality prediction, matches or exceeds prior work on both, showing that event streams from domains as different as mobility, security, and healthcare can be modeled under a single framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TraXion defines three axioms for MESES data and builds a pre-training model around them with cross-task and cross-domain claims, but provides no ablations to show the axioms are what produce the gains.

read the letter

The main takeaway is that this paper rejects language-model pre-training for mobility because trajectories have tuple-valued events, persistent user patterns, and cross-user dependencies. It turns those observations into three axioms for a broader class called MESES, then introduces TraXion whose objectives and architecture are built to satisfy them. The abstract reports that one checkpoint per dataset beats task-specific baselines on anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction across six mobility datasets, and that the same recipe works on authentication logs and ICU mortality without modification.

Referee Report

3 major / 0 minor

Summary. The paper claims that mobility trajectories and similar multi-entity spatiotemporal event streams (MESES) exhibit three structural properties—tuple-valued events whose semantics depend on joint location-time-activity distributions, persistent user signatures across trajectories, and non-independence of visits across users due to co-location—that distinguish them from text or generic time series. Existing language-modeling-style pre-training fails to respect these properties. The authors formalize the properties as three axioms that any MESES pre-training framework should satisfy, introduce TraXion whose objectives and architecture are jointly designed to meet the axioms, and report that a single TraXion checkpoint per dataset outperforms task-specific baselines on anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction across six public mobility datasets. The identical recipe, applied unchanged, matches or exceeds prior work on enterprise authentication logs and ICU mortality prediction.

Significance. If the empirical results and the necessity of the axioms are substantiated, the work would be significant: it offers a unified pre-training recipe for event-stream domains that share infrastructure or context (mobility, security, healthcare) and avoids the common practice of importing language-modeling objectives that ignore the data's joint structure and cross-entity dependencies.

major comments (3)

[Abstract] Abstract and experimental section: the central claim that a single TraXion checkpoint beats every task-specific baseline on all tasks requires a reproducible experimental protocol, baseline implementations, statistical significance tests, and ablation results; none of these are described or referenced in the provided text, rendering the performance claims unverifiable.
[Abstract] The argument that the three MESES axioms drive the observed gains (rather than other unablated factors such as encoder choice or training schedule) is load-bearing for the paper's contribution, yet no ablation is reported that removes or relaxes one axiom while keeping the rest of the architecture and recipe fixed and then measures the resulting performance drop on the reported tasks.
[Abstract] Abstract: the cross-domain transfer result (authentication logs and ICU mortality) is presented as using the 'same recipe, applied unchanged,' but no details are given on whether hyper-parameter search, domain-specific post-processing, or re-tuning occurred; this information is necessary to evaluate the claim of zero-shot transfer.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important gaps in experimental documentation and validation that we will address in the revision. Below we respond point-by-point to the three major comments.

read point-by-point responses

Referee: [Abstract] Abstract and experimental section: the central claim that a single TraXion checkpoint beats every task-specific baseline on all tasks requires a reproducible experimental protocol, baseline implementations, statistical significance tests, and ablation results; none of these are described or referenced in the provided text, rendering the performance claims unverifiable.

Authors: We agree that the current manuscript version lacks sufficient detail for independent reproduction of the results. In the revised manuscript we will add a dedicated 'Experimental Setup' subsection that fully specifies the training protocol (including data splits, batching, optimization hyperparameters, and early-stopping criteria), provides references or links to the exact baseline implementations used, reports statistical significance (paired t-tests or Wilcoxon signed-rank tests with p-values across five random seeds), and includes the requested ablation results. These additions will make the performance claims verifiable. revision: yes
Referee: [Abstract] The argument that the three MESES axioms drive the observed gains (rather than other unablated factors such as encoder choice or training schedule) is load-bearing for the paper's contribution, yet no ablation is reported that removes or relaxes one axiom while keeping the rest of the architecture and recipe fixed and then measures the resulting performance drop on the reported tasks.

Authors: We acknowledge that the manuscript does not yet contain ablations that isolate the contribution of each axiom. In the revision we will introduce three controlled variants of TraXion, each violating exactly one axiom while preserving the encoder, training schedule, and all other objectives: (1) an independent-event variant that factorizes the joint location-time-activity distribution, (2) a user-agnostic variant that drops persistent signature modeling, and (3) a user-independent variant that ignores co-location signals. We will report the resulting performance drops on all four mobility tasks, thereby directly linking the axioms to the observed gains. revision: yes
Referee: [Abstract] Abstract: the cross-domain transfer result (authentication logs and ICU mortality) is presented as using the 'same recipe, applied unchanged,' but no details are given on whether hyper-parameter search, domain-specific post-processing, or re-tuning occurred; this information is necessary to evaluate the claim of zero-shot transfer.

Authors: We will clarify this point explicitly. The transfer experiments used the identical pre-training objectives, architecture, optimizer settings, and hyperparameter values as the mobility runs; the only adaptation was the minimal input tokenization required to map the new event schemas into the same vocabulary format. No hyper-parameter search, domain-specific fine-tuning, or post-processing was performed on the target domains. A new paragraph in the experimental section will document this procedure and confirm the zero-shot nature of the transfer. revision: yes

Circularity Check

0 steps flagged

No circularity: axioms are motivated externally and performance is empirical.

full rationale

The paper extracts three structural properties from MESES data descriptions, formalizes them as axioms, designs objectives and architecture to satisfy those axioms, and reports experimental results on held-out tasks and cross-domain datasets. No equation equates a derived quantity to a fitted parameter defined by the same metric, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work. The performance edge is presented as an observed outcome of the design rather than a mathematical identity with the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 2 invented entities

Only the abstract is available. The central claim rests on three axioms derived from mobility properties and on the assertion that TraXion satisfies them. No numerical free parameters, additional axioms, or invented physical entities are extractable from the given text.

axioms (3)

domain assumption Visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity.
Stated in abstract as one of the three structural properties that define MESES and that existing language-model objectives fail to respect.
domain assumption Users carry persistent signatures across trajectories.
Stated in abstract as second structural property that pre-training must capture.
domain assumption Visits are not independent across users because co-location at shared places is a primary signal.
Stated in abstract as third structural property that pre-training must respect.

invented entities (2)

MESES (multi-entity spatiotemporal event streams) no independent evidence
purpose: Broader data class that includes mobility, authentication logs, and electronic health records.
Newly introduced term in the abstract to unify the domains.
TraXion no independent evidence
purpose: Pre-training framework whose objectives and architecture satisfy the three MESES axioms.
Newly introduced model name and design in the abstract.

pith-pipeline@v0.9.0 · 5535 in / 1723 out tokens · 54818 ms · 2026-05-11T00:55:14.064945+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Urban Anomalies: A simulated human mobility dataset with injected anomalies

Hossein Amiri, Ruochen Kong, and Andreas Züfle. Urban Anomalies: A simulated human mobility dataset with injected anomalies. InProceedings of the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection, pages 1–11, 2024

2024
[2]

Claude Code.https://www.anthropic.com/claude-code, 2025

Anthropic. Claude Code.https://www.anthropic.com/claude-code, 2025

2025
[3]

ICAD: A self-supervised autoregressive approach for multi-context anomaly detection in human mobility data

Bita Azarijoo, Maria Despoina Siampou, John Krumm, and Cyrus Shahabi. ICAD: A self-supervised autoregressive approach for multi-context anomaly detection in human mobility data. InProceedings of 9 the 33rd ACM International Conference on Advances in Geographic Information Systems, pages 595–606, 2025

2025
[4]

Contrastive trajectory similarity learning with dual-feature attention

Yanchuan Chang, Jianzhong Qi, Yuxuan Liang, and Egemen Tanin. Contrastive trajectory similarity learning with dual-feature attention. In2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 2933–2945. IEEE, 2023

2023
[5]

Recurrent neural networks for multivariate time series with missing values.Scientific Reports, 8(1):6085, 2018

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific Reports, 8(1):6085, 2018

2018
[6]

Mutual distillation learn- ing network for trajectory-user linking

Wei Chen, Shuzhe Li, Chao Huang, Yanwei Yu, Yongguo Jiang, and Junyu Dong. Mutual distillation learn- ing network for trajectory-user linking. InProceedings of the Thirty-First International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization., 2022

2022
[7]

Trajectory-user linking via hierarchical spatio-temporal attention networks.ACM Transactions on Knowledge Discovery from Data, 18(4):1–22, 2024

Wei Chen, Chao Huang, Yanwei Yu, Yongguo Jiang, and Junyu Dong. Trajectory-user linking via hierarchical spatio-temporal attention networks.ACM Transactions on Knowledge Discovery from Data, 18(4):1–22, 2024

2024
[8]

Friendship and mobility: user movement in location- based social networks

Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friendship and mobility: user movement in location- based social networks. InProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082–1090, 2011

2011
[9]

One model, many cities: A transferable social relationship inference framework for human mobility data

Chen Chu, Cyrus Shahabi, Emmanuel Tung, and Khurram Shafique. One model, many cities: A transferable social relationship inference framework for human mobility data. InProceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, pages 66–76, 2025

2025
[10]

Le, and Christopher D

Kevin Clark, Minh-Thang Luong, Quoc V . Le, and Christopher D. Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. InInternational Conference on Learning Representations, 2020

2020
[11]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019

2019
[12]

SimMTM: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023

Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. SimMTM: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023

2023
[13]

DeepLog: Anomaly detection and diagnosis from system logs through deep learning

Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1285–1298, 2017

2017
[14]

Dirichlet-Hawkes processes with applications to clustering continuous-time document streams

Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J Smola, and Le Song. Dirichlet-Hawkes processes with applications to clustering continuous-time document streams. InProceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 219–228, 2015

2015
[15]

Recurrent marked temporal point processes: Embedding event history to vector

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1555–1564, 2016

2016
[16]

Back to Bayesics: Uncovering human mobility distributions and anomalies with an integrated statistical and neural framework

Minxuan Duan, Yinlong Qian, Lingyi Zhao, Zihao Zhou, Zeeshan Rasheed, Rose Yu, and Khurram Shafique. Back to Bayesics: Uncovering human mobility distributions and anomalies with an integrated statistical and neural framework. InProceedings of the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection, pages 56–67, 2024

2024
[17]

AgentMove: A large language model based agentic framework for zero-shot next location prediction

Jie Feng, Yuwei Du, Jie Zhao, and Yong Li. AgentMove: A large language model based agentic framework for zero-shot next location prediction. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1322–1338, 2025

2025
[18]

Improving event representa- tion via simultaneous weakly supervised contrastive learning and clustering

Jun Gao, Wei Wang, Changlong Yu, Huan Zhao, Wilfred Ng, and Ruifeng Xu. Improving event representa- tion via simultaneous weakly supervised contrastive learning and clustering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022. 10

2022
[19]

Letian Gong, Yan Lin, Xinyue Zhang, Yiwen Lu, Xuedi Han, Yichen Liu, Shengnan Guo, Youfang Lin, and Huaiyu Wan. Mobility-LLM: Learning visiting intentions and travel preference from human mobility data with large language models.Advances in Neural Information Processing Systems, 37:36185–36217, 2024

2024
[20]

TabR: Tabular deep learning meets nearest neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. TabR: Tabular deep learning meets nearest neighbors. InThe Twelfth International Conference on Learning Representations, 2024

2024
[21]

LogBERT: Log anomaly detection via BERT

Haixuan Guo, Shuhan Yuan, and Xintao Wu. LogBERT: Log anomaly detection via BERT. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021

2021
[22]

WaveGNN: Integrating graph neural networks and transformers for decay-aware classification of irregular clinical time-series

Arash Hajisafi, Maria Despoina Siampou, Bita Azarijoo, Zhen Xiong, and Cyrus Shahabi. WaveGNN: Integrating graph neural networks and transformers for decay-aware classification of irregular clinical time-series. In2025 IEEE International Conference on Big Data (BigData), pages 1934–1943, 2025. doi: 10.1109/BigData66926.2025.11401906

work page doi:10.1109/bigdata66926.2025.11401906 1934
[23]

LogGPT: Log anomaly detection via GPT

Xiao Han, Shuhan Yuan, and Mohamed Trabelsi. LogGPT: Log anomaly detection via GPT. In2023 IEEE International Conference on Big Data (BigData), pages 1117–1122, 2023

2023
[24]

MobilityGPT: Enhanced human mobility modeling with a GPT model.IEEE Transactions on Intelligent Transportation Systems, 27(1):1681–1694, 2025

Ammar Haydari, Dongjie Chen, Zhengfeng Lai, Michael Zhang, and Chen-Nee Chuah. MobilityGPT: Enhanced human mobility modeling with a GPT model.IEEE Transactions on Intelligent Transportation Systems, 27(1):1681–1694, 2025

2025
[25]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

2022
[26]

TrajGPT: Controlled synthetic trajectory generation using a multitask transformer-based spatiotemporal model

Shang-Ling Hsu, Emmanuel Tung, John Krumm, Cyrus Shahabi, and Khurram Shafique. TrajGPT: Controlled synthetic trajectory generation using a multitask transformer-based spatiotemporal model. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, pages 362–371, 2024

2024
[27]

eICU Collaborative Research Database Demo.PhysioNet, May 2021

Alistair Johnson, Tom Pollard, Omar Badawi, and Jesse Raffa. eICU Collaborative Research Database Demo.PhysioNet, May 2021. doi: 10.13026/4mxk-na84. URL https://doi.org/10.13026/ 4mxk-na84. Version 2.0.1

work page doi:10.13026/4mxk-na84 2021
[28]

Self-attentive sequential recommendation

Wang-Cheng Kang and Julian McAuley. Self-attentive sequential recommendation. In2018 IEEE International Conference on Data Mining (ICDM), pages 197–206, 2018. doi: 10.1109/ICDM.2018.00035

work page doi:10.1109/icdm.2018.00035 2018
[29]

autoresearch.https://github.com/karpathy/autoresearch, 2025

Andrej Karpathy. autoresearch.https://github.com/karpathy/autoresearch, 2025

2025
[30]

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2Vec: Learning a vector representa- tion of time.arXiv preprint arXiv:1907.05321, 2019

work page Pith review arXiv 1907
[31]

Comprehensive, multi-source cyber-security events data set

Alexander D Kent. Comprehensive, multi-source cyber-security events data set. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2015

2015
[32]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational Conference on Machine Learning, 2019

2019
[33]

Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsuper- vised representations. InInternational Conference on Learning Representations, 2021

2021
[34]

TrajFlow: Nation-wide pseudo GPS trajectory generation with flow matching models

Peiran Li, Jiawei Wang, Haoran Zhang, Xiaodan Shi, Noboru Koshizuka, Chihiro Shimizu, and Renhe Jiang. TrajFlow: Nation-wide pseudo GPS trajectory generation with flow matching models. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[35]

BEHRT: transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

2020
[36]

Heterogeneous hyperbolic hypergraph neural network for friend recommendation in location-based social networks.ACM Transactions on Knowledge Discovery from Data, 19(3):1–29, 2025

Yongkang Li, Zipei Fan, and Xuan Song. Heterogeneous hyperbolic hypergraph neural network for friend recommendation in location-based social networks.ACM Transactions on Knowledge Discovery from Data, 19(3):1–29, 2025

2025
[37]

Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4):1748–1764, 2021

Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4):1748–1764, 2021. 11

2021
[38]

Pre-training context and time aware location embeddings from spatial-temporal trajectories for user next location prediction

Yan Lin, Huaiyu Wan, Shengnan Guo, and Youfang Lin. Pre-training context and time aware location embeddings from spatial-temporal trajectories for user next location prediction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 4241–4248, 2021

2021
[39]

Discovering latent network structure in point process data

Scott Linderman and Ryan Adams. Discovering latent network structure in point process data. In International Conference on Machine Learning, pages 1413–1421. PMLR, 2014

2014
[40]

iTrans- former: Inverted transformers are effective for time series forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. iTrans- former: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024

2024
[41]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations (ICLR), 2019

2019
[42]

Multi-scale representation learning for spatial feature distributions using grid cells

Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, and Ni Lao. Multi-scale representation learning for spatial feature distributions using grid cells. InInternational Conference on Learning Representations (ICLR), 2020

2020
[43]

UMAP: Uniform manifold approxi- mation and projection.Journal of Open Source Software, 3(29), 2018

Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. UMAP: Uniform manifold approxi- mation and projection.Journal of Open Source Software, 3(29), 2018

2018
[44]

The neural Hawkes process: A neurally self-modulating multivariate point process.Advances in Neural Information Processing Systems, 30, 2017

Hongyuan Mei and Jason M Eisner. The neural Hawkes process: A neurally self-modulating multivariate point process.Advances in Neural Information Processing Systems, 30, 2017

2017
[45]

Self-supervised log parsing

Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, and Odej Kao. Self-supervised log parsing. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020

2020
[46]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

2023
[47]

CEHR-BERT: Incorporating temporal information from structured ehr data to improve prediction tasks

Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. CEHR-BERT: Incorporating temporal information from structured ehr data to improve prediction tasks. InMachine Learning for Health, pages 239–260. PMLR, 2021

2021
[48]

EBM: an entropy-based model to infer social strength from spatiotemporal data

Huy Pham, Cyrus Shahabi, and Yan Liu. EBM: an entropy-based model to infer social strength from spatiotemporal data. InProceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 265–276, 2013

2013
[49]

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ Digital Medicine, 4(1):86, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ Digital Medicine, 4(1):86, 2021

2021
[50]

Samir, Jaroslaw Was, Quanzheng Li, David W

Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, and Arkadiusz Sitek. Zero shot health trajectory prediction using transformer.NPJ Digital Medicine, 7(1):256, 2024

2024
[51]

Neural temporal point processes: A review

Oleksandr Shchur, Ali Caner Türkmen, Tim Januschowski, and Stephan Günnemann. Neural temporal point processes: A review. InProceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 4585–4593. International Joint Conferences on Artificial Intelligence Organization, 2021

2021
[52]

Multi-time attention networks for irregularly sampled time series

Satya Narayan Shukla and Benjamin Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2021

2021
[53]

Mobility-Embedded POIs: Learning what a place is and how it is used from human movement

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu, Neha Arora, and Cyrus Shahabi. Mobility-Embedded POIs: Learning what a place is and how it is used from human movement. In Forty-third International Conference on Machine Learning, 2026. To appear

2026
[54]

NUMOSIM: A synthetic mobility dataset with anomaly detection benchmarks

Chris Stanford, Suman Adari, Xishun Liao, Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma, Emmanuel Tung, Yinlong Qian, Lingyi Zhao, et al. NUMOSIM: A synthetic mobility dataset with anomaly detection benchmarks. InProceedings of the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection, pages 68–78, 2024

2024
[55]

BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1441–1450, 2019. 12

2019
[56]

Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series.ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6): 1–17, 2022

Sindhu Tipirneni and Chandan K Reddy. Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series.ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6): 1–17, 2022

2022
[57]

Deep learning for unsupervised insider threat detection in structured cybersecurity data streams

Aaron Tuor, Samuel Kaplan, Brian Hutchinson, Nicole Nichols, and Sean Robinson. Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. InAAAI Workshops, pages 224–231, 2017

2017
[58]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[59]

Pre-training time-aware location embeddings from spatial-temporal trajectories.IEEE Transactions on Knowledge and Data Engineering, 34(11):5510–5523, 2021

Huaiyu Wan, Yan Lin, Shengnan Guo, and Youfang Lin. Pre-training time-aware location embeddings from spatial-temporal trajectories.IEEE Transactions on Knowledge and Data Engineering, 34(11):5510–5523, 2021

2021
[60]

CoBAD: Modeling collective behaviors for human mobility anomaly detection

Haomin Wen, Shurui Cao, and Leman Akoglu. CoBAD: Modeling collective behaviors for human mobility anomaly detection. InProceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, pages 197–209, 2025

2025
[61]

Uncertainty- aware spatio-temporal human mobility modeling and anomaly detection

Haomin Wen, Shurui Cao, Zeeshan Rasheed, Khurram Hassan Shafique, and Leman Akoglu. Uncertainty- aware spatio-temporal human mobility modeling and anomaly detection. InProceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, pages 328–331, 2025

2025
[62]

Contrastive learning for sequential recommendation

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. Contrastive learning for sequential recommendation. In2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 1259–1273. IEEE, 2022

2022
[63]

COAST: Contrastive learning with augmented spatio-temporal encoding for next POI recommendation

Bada Xin, Xin Wan, Zhuojun Jiang, Faqiang Liu, Su Chen, Rong Yang, and Qingyun Liu. COAST: Contrastive learning with augmented spatio-temporal encoding for next POI recommendation. InICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

2025
[64]

Taming the long tail in human mobility prediction.Advances in Neural Information Processing Systems, 37:54748–54771, 2024

Xiaohang Xu, Renhe Jiang, Chuang Yang, Zipei Fan, and Kaoru Sezaki. Taming the long tail in human mobility prediction.Advances in Neural Information Processing Systems, 37:54748–54771, 2024

2024
[65]

Where and when: predict next POI and its explicit timestamp in sequential recommendation

Yuanbo Xu, Hongxu Shen, Yiheng Jiang, and En Wang. Where and when: predict next POI and its explicit timestamp in sequential recommendation. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pages 3507–3515, 2025

2025
[66]

MobTCast: Leveraging auxiliary trajectory forecast- ing for human mobility prediction.Advances in Neural Information Processing Systems, 34:30380–30391, 2021

Hao Xue, Flora Salim, Yongli Ren, and Nuria Oliver. MobTCast: Leveraging auxiliary trajectory forecast- ing for human mobility prediction.Advances in Neural Information Processing Systems, 34:30380–30391, 2021

2021
[67]

Transformer embeddings of irregularly spaced events and their participants

Chenghao Yang, Hongyuan Mei, and Jason Eisner. Transformer embeddings of irregularly spaced events and their participants. InInternational Conference on Learning Representations, 2022

2022
[68]

Revisiting user mobility and social relationships in LBSNs: a hypergraph embedding approach

Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. Revisiting user mobility and social relationships in LBSNs: a hypergraph embedding approach. InThe World Wide Web Conference, pages 2147–2157, 2019

2019
[69]

GETNext: trajectory flow map enhanced transformer for next POI recommendation

Song Yang, Jiamou Liu, and Kaiqi Zhao. GETNext: trajectory flow map enhanced transformer for next POI recommendation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1144–1153, 2022

2022
[70]

UniST: A prompt-empowered universal model for urban spatio-temporal prediction

Yuan Yuan, Jingtao Ding, Jie Feng, Depeng Jin, and Yong Li. UniST: A prompt-empowered universal model for urban spatio-temporal prediction. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4095–4106, 2024

2024
[71]

TS2Vec: Towards universal representation of time series

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. TS2Vec: Towards universal representation of time series. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8980–8987, 2022

2022
[72]

A transformer-based framework for multivariate time series representation learning

George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2114–2124, 2021

2021
[73]

Scalable trajectory-user linking with dual-stream representation networks

Hao Zhang, Wei Chen, Xingyu Zhao, Jianpeng Qi, Guiyuan Jiang, and Yanwei Yu. Scalable trajectory-user linking with dual-stream representation networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 13224–13232, 2025. 13

2025
[74]

Graph-guided network for irregularly sampled multivariate time series

Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, and Marinka Zitnik. Graph-guided network for irregularly sampled multivariate time series. InInternational Conference on Learning Representations, 2022

2022
[75]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InThe Eleventh International Conference on Learning Representations, 2023

2023
[76]

Trajectory-user linking via variational autoencoder

Fan Zhou, Qiang Gao, Goce Trajcevski, Kunpeng Zhang, Ting Zhong, and Fengli Zhang. Trajectory-user linking via variational autoencoder. InProceedings of the 27th International Joint Conference on Artificial Intelligence, pages 3212–3218, 2018

2018
[77]

Learning triggering kernels for multi-dimensional Hawkes processes

Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional Hawkes processes. InInternational Conference on Machine Learning, pages 1301–1309. PMLR, 2013

2013
[78]

S3-Rec: Self-supervised learning for sequential recommendation with mutual information maximization

Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. S3-Rec: Self-supervised learning for sequential recommendation with mutual information maximization. InProceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 1893–1902, 2020

1902
[79]

One fits all: Power general time series analysis by pretrained LM.Advances in Neural Information Processing Systems, 36:43322–43355, 2023

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained LM.Advances in Neural Information Processing Systems, 36:43322–43355, 2023

2023
[80]

UniTraj: Learning a universal trajectory foundation model from billion-scale worldwide traces

Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Xun Zhou, Liang Han, Xuetao Wei, and Yuxuan Liang. UniTraj: Learning a universal trajectory foundation model from billion-scale worldwide traces. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

Showing first 80 references.