Recognition: unknown
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
Pith reviewed 2026-05-14 20:13 UTC · model grok-4.3
The pith
Large language models can exploit irregular sampling patterns in multimodal time series by representing them as XML triplets and using two-stage fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MILM represents multimodal irregular time series as time-ordered triplets in XML format and fine-tunes large language models through a two-stage process. The first stage trains on value-redacted MITS to isolate learning from sampling patterns, while the second stage trains on full MITS to jointly model patterns together with observed numerical and textual content. The resulting two-stage model achieves the best average performance across EHR datasets, with value-redaction tests confirming that sampling patterns carry independent predictive signal and that the model learns to use them.
What carries the argument
The XML representation of MITS as time-ordered triplets combined with a two-stage fine-tuning process that first isolates learning from sampling patterns before integrating full observations.
If this is right
- The two-stage model achieves the best average performance across multiple EHR datasets.
- The single-stage counterpart ranks second best on the same tasks.
- Value-redaction evaluations confirm that sampling patterns alone carry usable predictive signal.
- In value-pending settings the two-stage model outperforms the direct model by a larger margin than in standard evaluation.
- Preserving the time and channel of pending observations further improves in-hospital mortality prediction.
Where Pith is reading between the lines
- The same XML triplet encoding could let language models handle irregular multimodal data in non-healthcare domains without custom architectures.
- The focus on sampling patterns points to possible uses in systems that actively decide which next measurements to request.
- The method suggests pretrained language models can serve as a flexible base for sparse, heterogeneous observation streams.
Load-bearing premise
That representing irregular time series as XML triplets preserves enough temporal and channel structure for pretrained language models to learn from sampling patterns without major loss.
What would settle it
An experiment in which the two-stage model is retrained using plain text instead of XML triplets and its performance on EHR tasks falls to match or below standard baselines.
Figures
read the original abstract
Multimodal irregular time series (MITS) consist of asynchronous and irregularly sampled observations from heterogeneous numerical and textual channels. In healthcare, for example, patients' electronic health records (EHR) include irregular lab measurements and clinical notes. The irregular timing and channel patterns of observations carry predictive signal alongside the numerical values and textual content. LLMs are natural candidates for processing such heterogeneous data, given their extensive pretrained knowledge spanning textual and numerical domains. We introduce MILM (Multimodal Irregular time series Language Model), which represents MITS as time-ordered triplets in Extensible Markup Language (XML) format and fine-tunes an LLM through a two-stage strategy for MITS classification. The first stage trains on value-redacted MITS to predict from sampling patterns alone, and the second stage trains on full MITS to jointly model sampling patterns and observed values. Our two-stage model (MILM-2S) and its single-stage counterpart (MILM-Direct) achieve the best and second-best average performance on multiple EHR datasets. Further value redaction evaluations confirm that sampling patterns carry predictive signal and that MILM-2S learns to exploit them. In the value pending evaluation we introduce, where some values are unavailable at prediction time, MILM-2S outperforms MILM-Direct by a larger margin compared to standard evaluation. For MILM-2S, preserving the time and channel of value-pending observations as additional sampling information further improves in-hospital mortality prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MILM, a method for multimodal irregular time series (MITS) such as EHR data. It serializes observations as time-ordered XML triplets and fine-tunes LLMs via a two-stage strategy: first training on value-redacted MITS to learn from sampling patterns alone, then on full data to jointly model patterns and values. The central claims are that MILM-2S (two-stage) and MILM-Direct (single-stage) achieve the best and second-best average performance across multiple EHR datasets, that value-redaction evaluations confirm sampling patterns carry predictive signal which MILM-2S exploits, and that MILM-2S shows larger gains in a value-pending evaluation where some values are unavailable at prediction time.
Significance. If the performance claims and exploitation results hold under rigorous evaluation, the work would offer a practical way to leverage pretrained LLMs for heterogeneous irregular data in healthcare by explicitly modeling informative sampling via two-stage training and XML serialization. The value-pending evaluation protocol is a useful addition for realistic incomplete-data settings. Significance is tempered by the need for concrete numerical support and verification that the serialization preserves usable temporal structure.
major comments (3)
- [Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.
- [Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.
- [Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.
minor comments (2)
- Notation for the two variants (MILM-2S vs. MILM-Direct) should be introduced earlier and used consistently to avoid reader confusion when comparing the two-stage and single-stage results.
- The paper would benefit from a brief discussion of context-window limitations and how long sequences of triplets are handled, as this directly affects the temporal-structure concern.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of results, clarify methodological assumptions, and provide additional experimental details.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.
Authors: We agree that the abstract would benefit from explicit numerical support. In the revised manuscript we have added the average AUROC scores (with standard deviations) for MILM-2S and MILM-Direct, the identity of the strongest baseline, and a brief statement of the two-stage improvement. The main results tables already report per-dataset scores, standard deviations, and ablation comparisons; we have now highlighted statistical significance tests (paired t-tests) in the text and caption to make the load-bearing nature of the two-stage gain clearer. revision: yes
-
Referee: [Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.
Authors: We acknowledge that tokenization can in principle lose precision. However, the value-redaction experiments provide direct empirical evidence that the serialized format retains usable sampling information: MILM-2S trained only on redacted XML still outperforms MILM-Direct on the same redacted inputs, and attention maps (now included in the appendix) show non-trivial attention on the explicit <time> and <channel> tags. We have added a short paragraph in Section 3.1 explaining that numeric deltas are encoded as literal strings (e.g., “delta=3.2”) and that the model is fine-tuned to treat them as distinct tokens, together with a control experiment that randomizes the order of triplets and shows a clear drop in performance. revision: partial
-
Referee: [Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.
Authors: We have expanded both evaluation sections. The revised text now specifies: (i) the exact redaction procedure (randomly masking 30 % of values while keeping all timestamps and channels), (ii) per-dataset AUROC deltas with standard deviations between MILM-2S and MILM-Direct, (iii) dataset statistics (number of patients, average sequence length, missingness rates), and (iv) a context-window control that truncates all sequences to the same token budget. The larger margin observed in the value-pending setting remains after these controls, supporting the claim that MILM-2S exploits the preserved sampling metadata. revision: yes
Circularity Check
No circularity: empirical performance claims rest on independent dataset evaluations
full rationale
The paper introduces MILM as an LLM fine-tuning approach that serializes MITS into XML triplets and uses a two-stage training procedure (value-redacted then full data). All central claims—best average performance on EHR datasets, exploitation of sampling patterns via value-redaction ablations, and gains in value-pending settings—are supported by direct empirical comparisons against baselines. No equations, derivations, or uniqueness theorems are invoked; no parameters are fitted to a subset and then relabeled as predictions; no load-bearing premises reduce to self-citations. The modeling choices (XML format, two-stage schedule) are presented as design decisions whose validity is tested externally on held-out data, keeping the argument self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs pretrained on text can process structured XML representations of time series data to learn from sampling patterns
Reference graph
Works this paper leans on
-
[1]
Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series
Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wei Wang, Wen-Chih Peng, and Tien- Fu Chen. Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. URLhttps://openreview.net/forum?id=yeqrrn51TL
2025
-
[2]
Improving medical predictions by irregular multimodal electronic health records modeling
Xinlu Zhang, Shiyang Li, Zhiyu Chen, Xifeng Yan, and Linda Ruth Petzold. Improving medical predictions by irregular multimodal electronic health records modeling. InInternational conference on machine learning, pages 41300–41313. PMLR, 2023
2023
-
[3]
Multimodal language models for financial forecasting from interleaved sequences of text and time series
Ross Koval, Nicholas Andrews, and Xifeng Yan. Multimodal language models for financial forecasting from interleaved sequences of text and time series. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 987–1001, 2025
2025
-
[4]
Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S Yu, and Ying Li. A survey of aiops for failure management in the era of large language models.arXiv preprint arXiv:2406.11213, 2024
-
[5]
Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023
Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023
2023
-
[6]
MIMIC-IV.PhysioNet, October 2024
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV.PhysioNet, October 2024. doi: 10.13026/ kpb9-mt58. URLhttps://doi.org/10.13026/kpb9-mt58. Version 3.1
-
[7]
The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018
Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018
2018
-
[8]
eICU Collaborative Research Database.PhysioNet, April 2019
Tom Pollard, Alistair Johnson, Jesse Raffa, Leo Anthony Celi, Omar Badawi, and Roger Mark. eICU Collaborative Research Database.PhysioNet, April 2019. doi: 10.13026/C2WM1R. URL https://doi.org/10.13026/C2WM1R. Version 2.0
-
[9]
Using clinical notes with time series data for icu management
Swaraj Khadanga, Karan Aggarwal, Shafiq Joty, and Jaideep Srivastava. Using clinical notes with time series data for icu management. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6432–6437, 2019
2019
-
[10]
Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024
Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024
2024
-
[11]
Jeong Eul Kwon, Joo Heung Yoon, and Hyo Kyung Lee. Mind the missing: Variable-aware representation learning for irregular ehr time series using large language models.arXiv preprint arXiv:2509.22121, 2025
-
[12]
Unleashing the power of pre-trained language models for irregularly sampled time series
Weijia Zhang, Chenlong Yin, Hao Liu, and Hui Xiong. Unleashing the power of pre-trained language models for irregularly sampled time series. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3831–3842, 2025
2025
-
[13]
Satya Narayan Shukla and Benjamin M Marlin. A survey on principles, models and methods for learning from irregularly sampled time series.arXiv preprint arXiv:2012.00168, 2020
-
[14]
Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018
2018
-
[15]
Interpolation-prediction networks for irregularly sampled time series
Satya Narayan Shukla and Benjamin Marlin. Interpolation-prediction networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1efr3C9Ym. 10
2019
-
[16]
Multi-time attention networks for irregularly sampled time series
Satya Narayan Shukla and Benjamin Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=4c0J6lwQ4_
2021
-
[17]
Adaptive time encoding for irregular multivariate time-series classification
Sangho Lee, Kyeongseo Min, Youngdoo Son, and Hyungrok Do. Adaptive time encoding for irregular multivariate time-series classification. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[18]
Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019
Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019
2019
-
[19]
Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019
Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019
2019
-
[20]
Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020
Patrick Kidger, James Morrill, James Foster, and Terry Lyons. Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020
2020
-
[21]
Modeling irregular time series with continuous recurrent units
Mona Schirmer, Mazin Eltayeb, Stefan Lessmann, and Maja Rudolph. Modeling irregular time series with continuous recurrent units. InInternational conference on machine learning, pages 19388–19405. PMLR, 2022
2022
-
[22]
Neural continuous-discrete state space models for irregularly-sampled time series
Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. InInternational Conference on Machine Learning, pages 926–951. PMLR, 2023
2023
-
[23]
Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023
Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, and Dongsheng Li. Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023
2023
-
[24]
Warpformer: A multi-scale modeling approach for irregular clinical time series
Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, and Jia Li. Warpformer: A multi-scale modeling approach for irregular clinical time series. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3273–3285, 2023
2023
-
[25]
Set functions for time series
Max Horn, Michael Moor, Christian Bock, Bastian Rieck, and Karsten Borgwardt. Set functions for time series. InInternational Conference on Machine Learning, pages 4353–4363. PMLR, 2020
2020
-
[26]
Graph-guided net- work for irregularly sampled multivariate time series
Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, and Marinka Zitnik. Graph-guided net- work for irregularly sampled multivariate time series. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=Kwm8I7dU-l5
2022
-
[27]
Irregular multivariate time series forecasting: A transformable patching graph neural networks approach
Weijia Zhang, Chenlong Yin, Hao Liu, Xiaofang Zhou, and Hui Xiong. Irregular multivariate time series forecasting: A transformable patching graph neural networks approach. InForty-first International Conference on Machine Learning, 2024
2024
-
[28]
Grafiti: Graphs for forecasting irregularly sampled time series
Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf Scholz, Nourhan Ahmed, Johannes Burchert, Shayan Jawed, Stefan Born, and Lars Schmidt-Thieme. Grafiti: Graphs for forecasting irregularly sampled time series. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16255–16263, 2024
2024
-
[29]
Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023
Zekun Li, Shiyang Li, and Xifeng Yan. Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023
2023
-
[30]
Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023
Hao Xue and Flora D Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023
2023
-
[31]
Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023
Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew G Wilson. Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023. 11
2023
-
[32]
Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. Tempo: Prompt-based generative pre-trained transformer for time series forecasting.arXiv preprint arXiv:2310.04948, 2023
-
[33]
One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023
Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023
2023
-
[34]
Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y . Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time-LLM: Time series forecasting by reprogramming large language models. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Unb5CVPtae
2024
-
[35]
Chenxi Sun, Hongyan Li, Yaliang Li, and Shenda Hong. Test: Text prototype aligned embedding to activate llm’s ability for time series.arXiv preprint arXiv:2308.08241, 2023
-
[36]
Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024
Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024
2024
-
[37]
S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting
Zijie Pan, Yushan Jiang, Sahil Garg, Anderson Schneider, Yuriy Nevmyvaka, and Dongjin Song. S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting. In Forty-first International Conference on Machine Learning, 2024
2024
-
[38]
Calf: Aligning llms for time series forecasting via cross-modal fine-tuning
Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, and Shu-Tao Xia. Calf: Aligning llms for time series forecasting via cross-modal fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18915–18923, 2025
2025
-
[39]
Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment
Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, and Rui Zhao. Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18780–18788, 2025
2025
-
[40]
Multimodal llms for health grounded in individual-specific data
Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y McLean, and Nicholas A Furlotte. Multimodal llms for health grounded in individual-specific data. InWorkshop on Machine Learning for Multimodal Healthcare Data, pages 86–102. Springer, 2023
2023
-
[41]
Nimeesha Chan, Felix Parker, William Bennett, Tianyi Wu, Mung Yao Jia, James Fackler, and Kimia Ghobadi. Medtsllm: Leveraging llms for multimodal medical time series analysis.arXiv preprint arXiv:2408.07773, 2024
-
[42]
Gpt4mts: Prompt-based large language model for multimodal time-series forecasting
Furong Jia, Kevin Wang, Yixiang Zheng, Defu Cao, and Yan Liu. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 23343–23351, 2024
2024
-
[43]
Instructime: Advancing time series classification with multimodal language modeling
Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo, and Enhong Chen. Instructime: Advancing time series classification with multimodal language modeling. InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, pages 792–800, 2025
2025
-
[44]
Chattime: A unified multimodal time series foundation model bridging numerical and textual data
Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. Chattime: A unified multimodal time series foundation model bridging numerical and textual data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 12694–12702, 2025
2025
-
[45]
Timecap: Learning to contextualize, augment, and predict time series events with large language model agents
Geon Lee, Wenchao Yu, Kijung Shin, Wei Cheng, and Haifeng Chen. Timecap: Learning to contextualize, augment, and predict time series events with large language model agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18082–18090, 2025. 12
2025
-
[46]
Patrick Langer, Thomas Kaar, Max Rosenblattl, Maxwell A Xu, Winnie Chow, Martin Maritsch, Robert Jakob, Ning Wang, Juncheng Liu, Aradhana Verma, et al. Opentslm: Time-series language models for reasoning over multivariate medical text-and time-series data.arXiv preprint arXiv:2510.02410, 2025
-
[47]
A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023
-
[48]
Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024
-
[49]
Chronos: Learning the Language of Time Series
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025
Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025
-
[51]
Zefang Liu and Yinzhu Quan. Tpp-llm: Modeling temporal point processes by efficiently fine-tuning large language models.arXiv preprint arXiv:2410.02062, 2024
-
[52]
Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025
Shubham Gupta, Thibaut Durand, Graham Taylor, et al. Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025
-
[53]
Byte-token enhanced language models for temporal point processes analysis
Quyu Kong, Yixuan Zhang, Yang Liu, Panrong Tong, Enqi Liu, and Feng Zhou. Byte-token enhanced language models for temporal point processes analysis. InProceedings of the ACM Web Conference 2026, pages 7013–7023, 2026
2026
-
[54]
Rose Sisk, Lijing Lin, Matthew Sperrin, Jessica K Barrett, Brian Tom, Karla Diaz-Ordaz, Niels Peek, and Glen P Martin. Informative presence and observation in routine health data: a review of methodology for clinical risk prediction.Journal of the American Medical Informatics Association, 28(1):155–166, 2021
2021
-
[55]
Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023
Amelia LM Tan, Emily J Getzen, Meghan R Hutch, Zachary H Strasser, Alba Gutiérrez- Sacristán, Trang T Le, Arianna Dagliati, Michele Morris, David A Hanauer, Bertrand Moal, et al. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023
2023
-
[56]
John Wiley & Sons, 2019
Roderick JA Little and Donald B Rubin.Statistical analysis with missing data. John Wiley & Sons, 2019
2019
-
[57]
Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004
Haiqun Lin, Daniel O Scharfstein, and Robert A Rosenheck. Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004
2004
-
[58]
Account- ing for informative sampling when learning to forecast treatment outcomes over time
Toon Vanderschueren, Alicia Curth, Wouter Verbeke, and Mihaela Van Der Schaar. Account- ing for informative sampling when learning to forecast treatment outcomes over time. In International Conference on Machine Learning, pages 34855–34874. PMLR, 2023
2023
-
[59]
Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020
Alessandro Gasparini, Keith R Abrams, Jessica K Barrett, Rupert W Major, Michael J Sweeting, Nigel J Brunskill, and Michael J Crowther. Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020
2020
-
[60]
Prediction of survival outcomes under clinical presence shift: A joint neural network architecture
Vincent Jeanselme, Glen Martin, Matthew Sperrin, Niels Peek, Brian Tom, and Jessica Barrett. Prediction of survival outcomes under clinical presence shift: A joint neural network architecture. arXiv preprint arXiv:2508.05472, 2025
-
[61]
Yuta Kobayashi, Vincent Jeanselme, and Shalmali Joshi. Mind the data gap: Missingness still shapes large language model prognoses.arXiv preprint arXiv:2512.00479, 2025. 13
-
[62]
MIMIC-IV Clinical Database Demo.PhysioNet, January 2023
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV Clinical Database Demo.PhysioNet, January 2023. doi: 10.13026/dp1f-ex47. URLhttps://doi.org/10.13026/dp1f-ex47. Version 2.2
-
[63]
MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023
Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023. doi: 10.13026/1n74-ne17. URLhttps://doi.org/10.13026/1n74-ne17. Version 2.2
-
[64]
Leveraging large language models for multiple choice question answering
Joshua Robinson and David Wingate. Leveraging large language models for multiple choice question answering. InThe Eleventh International Conference on Learning Representations,
-
[65]
URLhttps://openreview.net/forum?id=yKbprarjc5B
-
[66]
Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019
Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019
2019
-
[67]
Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020
Seyedmostafa Sheikhalishahi, Vevake Balaraman, and Venet Osmani. Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020
2020
-
[68]
Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020
2020
-
[69]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[70]
Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023
2023
-
[71]
Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody H Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al. Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024
2024
-
[72]
Mind the performance gap: examining dataset shift during prospective validation
Erkin Otles, Jeeheh Oh, Benjamin Li, Michelle Bochinski, Hyeon Joo, Justin Ortwine, Erica Shenoy, Laraine Washer, Vincent B Young, Krishna Rao, et al. Mind the performance gap: examining dataset shift during prospective validation. InMachine Learning for Healthcare Conference, pages 506–534. PMLR, 2021
2021
-
[73]
Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016
2016
-
[74]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7
2019
-
[75]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015. URL https://arxiv.org/ abs/1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[76]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
2019
-
[77]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
2019
-
[78]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024. 14
2024
-
[79]
Time2Vec: Learning a Vector Representation of Time
Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[80]
Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.