arxiv: 2605.13711 · v1 · submitted 2026-05-13 · 💻 cs.LG

Recognition: unknown

MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

Hsing-Huan Chung, Joydeep Ghosh, Shijun Li, Suchi Saria, Xing Han, Yoav Wald

Pith reviewed 2026-05-14 20:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords multimodal irregular time serieslarge language modelselectronic health recordssampling patternstwo-stage fine-tuningXML representationhealthcare predictionirregular time series

0 comments

The pith

Large language models can exploit irregular sampling patterns in multimodal time series by representing them as XML triplets and using two-stage fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that multimodal irregular time series, such as those found in electronic health records, contain predictive information not only in their measured values but also in the patterns of when and which observations are recorded. By converting these series into time-ordered triplets formatted in XML, pretrained large language models can be adapted to process the combined numerical, textual, and timing data. A two-stage fine-tuning strategy first exposes the model to value-redacted inputs so it learns from sampling patterns alone, then trains on complete data to integrate patterns with actual values. This yields the highest average performance on multiple EHR classification tasks. The approach also shows stronger relative gains in settings where some values remain unavailable at prediction time.

Core claim

MILM represents multimodal irregular time series as time-ordered triplets in XML format and fine-tunes large language models through a two-stage process. The first stage trains on value-redacted MITS to isolate learning from sampling patterns, while the second stage trains on full MITS to jointly model patterns together with observed numerical and textual content. The resulting two-stage model achieves the best average performance across EHR datasets, with value-redaction tests confirming that sampling patterns carry independent predictive signal and that the model learns to use them.

What carries the argument

The XML representation of MITS as time-ordered triplets combined with a two-stage fine-tuning process that first isolates learning from sampling patterns before integrating full observations.

If this is right

The two-stage model achieves the best average performance across multiple EHR datasets.
The single-stage counterpart ranks second best on the same tasks.
Value-redaction evaluations confirm that sampling patterns alone carry usable predictive signal.
In value-pending settings the two-stage model outperforms the direct model by a larger margin than in standard evaluation.
Preserving the time and channel of pending observations further improves in-hospital mortality prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same XML triplet encoding could let language models handle irregular multimodal data in non-healthcare domains without custom architectures.
The focus on sampling patterns points to possible uses in systems that actively decide which next measurements to request.
The method suggests pretrained language models can serve as a flexible base for sparse, heterogeneous observation streams.

Load-bearing premise

That representing irregular time series as XML triplets preserves enough temporal and channel structure for pretrained language models to learn from sampling patterns without major loss.

What would settle it

An experiment in which the two-stage model is retrained using plain text instead of XML triplets and its performance on EHR tasks falls to match or below standard baselines.

Figures

Figures reproduced from arXiv: 2605.13711 by Hsing-Huan Chung, Joydeep Ghosh, Shijun Li, Suchi Saria, Xing Han, Yoav Wald.

**Figure 2.** Figure 2: An illustration of MILM. MILM serializes the MITS to a time-ordered XML representation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Value redaction evaluation across datasets. All models are tested with observed values [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of note character lengths across all four datasets. Dashed red and solid orange [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of pending event count (left two) and pending event rate (right two) per ICU [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of charttime for value-pending and non-pending observations. 3.3% of all lab observations. Notes are pending at a higher rate: 13.5% (IHM) and 14.0% (LOS) [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

read the original abstract

Multimodal irregular time series (MITS) consist of asynchronous and irregularly sampled observations from heterogeneous numerical and textual channels. In healthcare, for example, patients' electronic health records (EHR) include irregular lab measurements and clinical notes. The irregular timing and channel patterns of observations carry predictive signal alongside the numerical values and textual content. LLMs are natural candidates for processing such heterogeneous data, given their extensive pretrained knowledge spanning textual and numerical domains. We introduce MILM (Multimodal Irregular time series Language Model), which represents MITS as time-ordered triplets in Extensible Markup Language (XML) format and fine-tunes an LLM through a two-stage strategy for MITS classification. The first stage trains on value-redacted MITS to predict from sampling patterns alone, and the second stage trains on full MITS to jointly model sampling patterns and observed values. Our two-stage model (MILM-2S) and its single-stage counterpart (MILM-Direct) achieve the best and second-best average performance on multiple EHR datasets. Further value redaction evaluations confirm that sampling patterns carry predictive signal and that MILM-2S learns to exploit them. In the value pending evaluation we introduce, where some values are unavailable at prediction time, MILM-2S outperforms MILM-Direct by a larger margin compared to standard evaluation. For MILM-2S, preserving the time and channel of value-pending observations as additional sampling information further improves in-hospital mortality prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two-stage training on redacted data to isolate sampling patterns is a sensible engineering move for irregular EHR series, but the abstract gives almost no numbers or controls so it's impossible to tell if it actually delivers.

read the letter

The new piece is the explicit two-stage fine-tuning: first train the LLM only on the timing and channel tags with values stripped out, then switch to the full data. That isolates whether the model is really picking up the irregular sampling signal rather than just the numbers or text. The XML triplet format is a straightforward way to feed mixed numerical and textual channels into a pretrained LLM without custom encoders, and the value-pending evaluation is a useful stress test for real deployment where some labs are missing at prediction time. Those choices make sense for healthcare data where observation timing often carries information. The main weakness is that the abstract reports only rankings and a pending evaluation with no actual scores, no baseline details, no statistical tests, and no ablation numbers. Without those it's hard to know whether the two-stage version beats the direct version by a meaningful margin or whether the XML serialization is losing the fine-grained deltas that the stress test worries about. The full paper would need to show the raw metrics, the exact baselines, and checks that the model isn't just memorizing common sampling schedules. This is aimed at people already working on LLM time-series hybrids for EHR or similar irregular multimodal streams. It is worth sending to referees because the core idea is clean enough to test properly and the problem is practically relevant, even if the current write-up is too light on evidence to judge the result yet.

Referee Report

3 major / 2 minor

Summary. The paper introduces MILM, a method for multimodal irregular time series (MITS) such as EHR data. It serializes observations as time-ordered XML triplets and fine-tunes LLMs via a two-stage strategy: first training on value-redacted MITS to learn from sampling patterns alone, then on full data to jointly model patterns and values. The central claims are that MILM-2S (two-stage) and MILM-Direct (single-stage) achieve the best and second-best average performance across multiple EHR datasets, that value-redaction evaluations confirm sampling patterns carry predictive signal which MILM-2S exploits, and that MILM-2S shows larger gains in a value-pending evaluation where some values are unavailable at prediction time.

Significance. If the performance claims and exploitation results hold under rigorous evaluation, the work would offer a practical way to leverage pretrained LLMs for heterogeneous irregular data in healthcare by explicitly modeling informative sampling via two-stage training and XML serialization. The value-pending evaluation protocol is a useful addition for realistic incomplete-data settings. Significance is tempered by the need for concrete numerical support and verification that the serialization preserves usable temporal structure.

major comments (3)

[Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.
[Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.
[Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.

minor comments (2)

Notation for the two variants (MILM-2S vs. MILM-Direct) should be introduced earlier and used consistently to avoid reader confusion when comparing the two-stage and single-stage results.
The paper would benefit from a brief discussion of context-window limitations and how long sequences of triplets are handled, as this directly affects the temporal-structure concern.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of results, clarify methodological assumptions, and provide additional experimental details.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.

Authors: We agree that the abstract would benefit from explicit numerical support. In the revised manuscript we have added the average AUROC scores (with standard deviations) for MILM-2S and MILM-Direct, the identity of the strongest baseline, and a brief statement of the two-stage improvement. The main results tables already report per-dataset scores, standard deviations, and ablation comparisons; we have now highlighted statistical significance tests (paired t-tests) in the text and caption to make the load-bearing nature of the two-stage gain clearer. revision: yes
Referee: [Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.

Authors: We acknowledge that tokenization can in principle lose precision. However, the value-redaction experiments provide direct empirical evidence that the serialized format retains usable sampling information: MILM-2S trained only on redacted XML still outperforms MILM-Direct on the same redacted inputs, and attention maps (now included in the appendix) show non-trivial attention on the explicit <time> and <channel> tags. We have added a short paragraph in Section 3.1 explaining that numeric deltas are encoded as literal strings (e.g., “delta=3.2”) and that the model is fine-tuned to treat them as distinct tokens, together with a control experiment that randomizes the order of triplets and shows a clear drop in performance. revision: partial
Referee: [Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.

Authors: We have expanded both evaluation sections. The revised text now specifies: (i) the exact redaction procedure (randomly masking 30 % of values while keeping all timestamps and channels), (ii) per-dataset AUROC deltas with standard deviations between MILM-2S and MILM-Direct, (iii) dataset statistics (number of patients, average sequence length, missingness rates), and (iv) a context-window control that truncates all sequences to the same token budget. The larger margin observed in the value-pending setting remains after these controls, supporting the claim that MILM-2S exploits the preserved sampling metadata. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on independent dataset evaluations

full rationale

The paper introduces MILM as an LLM fine-tuning approach that serializes MITS into XML triplets and uses a two-stage training procedure (value-redacted then full data). All central claims—best average performance on EHR datasets, exploitation of sampling patterns via value-redaction ablations, and gains in value-pending settings—are supported by direct empirical comparisons against baselines. No equations, derivations, or uniqueness theorems are invoked; no parameters are fitted to a subset and then relabeled as predictions; no load-bearing premises reduce to self-citations. The modeling choices (XML format, two-stage schedule) are presented as design decisions whose validity is tested externally on held-out data, keeping the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes LLMs can ingest and reason over XML-structured time-channel-value triplets without loss of irregular timing information; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption LLMs pretrained on text can process structured XML representations of time series data to learn from sampling patterns
Invoked by the choice of XML encoding and the first training stage on value-redacted data.

pith-pipeline@v0.9.0 · 5583 in / 1213 out tokens · 38930 ms · 2026-05-14T20:13:44.741809+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 23 canonical work pages · 5 internal anchors

[1]

Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series

Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wei Wang, Wen-Chih Peng, and Tien- Fu Chen. Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. URLhttps://openreview.net/forum?id=yeqrrn51TL

2025
[2]

Improving medical predictions by irregular multimodal electronic health records modeling

Xinlu Zhang, Shiyang Li, Zhiyu Chen, Xifeng Yan, and Linda Ruth Petzold. Improving medical predictions by irregular multimodal electronic health records modeling. InInternational conference on machine learning, pages 41300–41313. PMLR, 2023

2023
[3]

Multimodal language models for financial forecasting from interleaved sequences of text and time series

Ross Koval, Nicholas Andrews, and Xifeng Yan. Multimodal language models for financial forecasting from interleaved sequences of text and time series. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 987–1001, 2025

2025
[4]

A survey of aiops for failure management in the era of large language models.arXiv preprint arXiv:2406.11213, 2024

Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S Yu, and Ying Li. A survey of aiops for failure management in the era of large language models.arXiv preprint arXiv:2406.11213, 2024

work page arXiv 2024
[5]

Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

2023
[6]

MIMIC-IV.PhysioNet, October 2024

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV.PhysioNet, October 2024. doi: 10.13026/ kpb9-mt58. URLhttps://doi.org/10.13026/kpb9-mt58. Version 3.1

work page doi:10.13026/kpb9-mt58 2024
[7]

The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018

Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018

2018
[8]

eICU Collaborative Research Database.PhysioNet, April 2019

Tom Pollard, Alistair Johnson, Jesse Raffa, Leo Anthony Celi, Omar Badawi, and Roger Mark. eICU Collaborative Research Database.PhysioNet, April 2019. doi: 10.13026/C2WM1R. URL https://doi.org/10.13026/C2WM1R. Version 2.0

work page doi:10.13026/c2wm1r 2019
[9]

Using clinical notes with time series data for icu management

Swaraj Khadanga, Karan Aggarwal, Shafiq Joty, and Jaideep Srivastava. Using clinical notes with time series data for icu management. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6432–6437, 2019

2019
[10]

Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024

Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024

2024
[11]

Mind the missing: Variable-aware representation learning for irregular ehr time series using large language models.arXiv preprint arXiv:2509.22121, 2025

Jeong Eul Kwon, Joo Heung Yoon, and Hyo Kyung Lee. Mind the missing: Variable-aware representation learning for irregular ehr time series using large language models.arXiv preprint arXiv:2509.22121, 2025

work page arXiv 2025
[12]

Unleashing the power of pre-trained language models for irregularly sampled time series

Weijia Zhang, Chenlong Yin, Hao Liu, and Hui Xiong. Unleashing the power of pre-trained language models for irregularly sampled time series. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3831–3842, 2025

2025
[13]

A survey on principles, models and methods for learning from irregularly sampled time series.arXiv preprint arXiv:2012.00168, 2020

Satya Narayan Shukla and Benjamin M Marlin. A survey on principles, models and methods for learning from irregularly sampled time series.arXiv preprint arXiv:2012.00168, 2020

work page arXiv 2012
[14]

Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

2018
[15]

Interpolation-prediction networks for irregularly sampled time series

Satya Narayan Shukla and Benjamin Marlin. Interpolation-prediction networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1efr3C9Ym. 10

2019
[16]

Multi-time attention networks for irregularly sampled time series

Satya Narayan Shukla and Benjamin Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=4c0J6lwQ4_

2021
[17]

Adaptive time encoding for irregular multivariate time-series classification

Sangho Lee, Kyeongseo Min, Youngdoo Son, and Hyungrok Do. Adaptive time encoding for irregular multivariate time-series classification. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[18]

Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019

Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019

2019
[19]

Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019

Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019

2019
[20]

Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020

Patrick Kidger, James Morrill, James Foster, and Terry Lyons. Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020

2020
[21]

Modeling irregular time series with continuous recurrent units

Mona Schirmer, Mazin Eltayeb, Stefan Lessmann, and Maja Rudolph. Modeling irregular time series with continuous recurrent units. InInternational conference on machine learning, pages 19388–19405. PMLR, 2022

2022
[22]

Neural continuous-discrete state space models for irregularly-sampled time series

Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. InInternational Conference on Machine Learning, pages 926–951. PMLR, 2023

2023
[23]

Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023

Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, and Dongsheng Li. Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023

2023
[24]

Warpformer: A multi-scale modeling approach for irregular clinical time series

Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, and Jia Li. Warpformer: A multi-scale modeling approach for irregular clinical time series. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3273–3285, 2023

2023
[25]

Set functions for time series

Max Horn, Michael Moor, Christian Bock, Bastian Rieck, and Karsten Borgwardt. Set functions for time series. InInternational Conference on Machine Learning, pages 4353–4363. PMLR, 2020

2020
[26]

Graph-guided net- work for irregularly sampled multivariate time series

Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, and Marinka Zitnik. Graph-guided net- work for irregularly sampled multivariate time series. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=Kwm8I7dU-l5

2022
[27]

Irregular multivariate time series forecasting: A transformable patching graph neural networks approach

Weijia Zhang, Chenlong Yin, Hao Liu, Xiaofang Zhou, and Hui Xiong. Irregular multivariate time series forecasting: A transformable patching graph neural networks approach. InForty-first International Conference on Machine Learning, 2024

2024
[28]

Grafiti: Graphs for forecasting irregularly sampled time series

Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf Scholz, Nourhan Ahmed, Johannes Burchert, Shayan Jawed, Stefan Born, and Lars Schmidt-Thieme. Grafiti: Graphs for forecasting irregularly sampled time series. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16255–16263, 2024

2024
[29]

Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023

Zekun Li, Shiyang Li, and Xifeng Yan. Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023

2023
[30]

Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023

Hao Xue and Flora D Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023

2023
[31]

Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023

Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew G Wilson. Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023. 11

2023
[32]

Tempo: Prompt-based generative pre-trained transformer for time series forecasting.arXiv preprint arXiv:2310.04948, 2023

Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. Tempo: Prompt-based generative pre-trained transformer for time series forecasting.arXiv preprint arXiv:2310.04948, 2023

work page arXiv 2023
[33]

One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

2023
[34]

Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y . Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time-LLM: Time series forecasting by reprogramming large language models. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Unb5CVPtae

2024
[35]

Test: Text prototype aligned embedding to activate llm’s ability for time series.arXiv preprint arXiv:2308.08241, 2023

Chenxi Sun, Hongyan Li, Yaliang Li, and Shenda Hong. Test: Text prototype aligned embedding to activate llm’s ability for time series.arXiv preprint arXiv:2308.08241, 2023

work page arXiv 2023
[36]

Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

2024
[37]

S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting

Zijie Pan, Yushan Jiang, Sahil Garg, Anderson Schneider, Yuriy Nevmyvaka, and Dongjin Song. S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting. In Forty-first International Conference on Machine Learning, 2024

2024
[38]

Calf: Aligning llms for time series forecasting via cross-modal fine-tuning

Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, and Shu-Tao Xia. Calf: Aligning llms for time series forecasting via cross-modal fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18915–18923, 2025

2025
[39]

Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment

Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, and Rui Zhao. Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18780–18788, 2025

2025
[40]

Multimodal llms for health grounded in individual-specific data

Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y McLean, and Nicholas A Furlotte. Multimodal llms for health grounded in individual-specific data. InWorkshop on Machine Learning for Multimodal Healthcare Data, pages 86–102. Springer, 2023

2023
[41]

Medtsllm: Leveraging llms for multimodal medical time series analysis.arXiv preprint arXiv:2408.07773, 2024

Nimeesha Chan, Felix Parker, William Bennett, Tianyi Wu, Mung Yao Jia, James Fackler, and Kimia Ghobadi. Medtsllm: Leveraging llms for multimodal medical time series analysis.arXiv preprint arXiv:2408.07773, 2024

work page arXiv 2024
[42]

Gpt4mts: Prompt-based large language model for multimodal time-series forecasting

Furong Jia, Kevin Wang, Yixiang Zheng, Defu Cao, and Yan Liu. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 23343–23351, 2024

2024
[43]

Instructime: Advancing time series classification with multimodal language modeling

Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo, and Enhong Chen. Instructime: Advancing time series classification with multimodal language modeling. InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, pages 792–800, 2025

2025
[44]

Chattime: A unified multimodal time series foundation model bridging numerical and textual data

Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. Chattime: A unified multimodal time series foundation model bridging numerical and textual data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 12694–12702, 2025

2025
[45]

Timecap: Learning to contextualize, augment, and predict time series events with large language model agents

Geon Lee, Wenchao Yu, Kijung Shin, Wei Cheng, and Haifeng Chen. Timecap: Learning to contextualize, augment, and predict time series events with large language model agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18082–18090, 2025. 12

2025
[46]

Opentslm: Time-series language models for reasoning over multivariate medical text-and time-series data.arXiv preprint arXiv:2510.02410, 2025

Patrick Langer, Thomas Kaar, Max Rosenblattl, Maxwell A Xu, Winnie Chow, Martin Maritsch, Robert Jakob, Ning Wang, Juncheng Liu, Aradhana Verma, et al. Opentslm: Time-series language models for reasoning over multivariate medical text-and time-series data.arXiv preprint arXiv:2510.02410, 2025

work page arXiv 2025
[47]

A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

work page arXiv 2023
[48]

Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

work page arXiv 2024
[49]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

work page arXiv 2025
[51]

Tpp-llm: Modeling temporal point processes by efficiently fine-tuning large language models.arXiv preprint arXiv:2410.02062, 2024

Zefang Liu and Yinzhu Quan. Tpp-llm: Modeling temporal point processes by efficiently fine-tuning large language models.arXiv preprint arXiv:2410.02062, 2024

work page arXiv 2024
[52]

Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025

Shubham Gupta, Thibaut Durand, Graham Taylor, et al. Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025

work page arXiv 2025
[53]

Byte-token enhanced language models for temporal point processes analysis

Quyu Kong, Yixuan Zhang, Yang Liu, Panrong Tong, Enqi Liu, and Feng Zhou. Byte-token enhanced language models for temporal point processes analysis. InProceedings of the ACM Web Conference 2026, pages 7013–7023, 2026

2026
[54]

Rose Sisk, Lijing Lin, Matthew Sperrin, Jessica K Barrett, Brian Tom, Karla Diaz-Ordaz, Niels Peek, and Glen P Martin. Informative presence and observation in routine health data: a review of methodology for clinical risk prediction.Journal of the American Medical Informatics Association, 28(1):155–166, 2021

2021
[55]

Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023

Amelia LM Tan, Emily J Getzen, Meghan R Hutch, Zachary H Strasser, Alba Gutiérrez- Sacristán, Trang T Le, Arianna Dagliati, Michele Morris, David A Hanauer, Bertrand Moal, et al. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023

2023
[56]

John Wiley & Sons, 2019

Roderick JA Little and Donald B Rubin.Statistical analysis with missing data. John Wiley & Sons, 2019

2019
[57]

Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004

Haiqun Lin, Daniel O Scharfstein, and Robert A Rosenheck. Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004

2004
[58]

Account- ing for informative sampling when learning to forecast treatment outcomes over time

Toon Vanderschueren, Alicia Curth, Wouter Verbeke, and Mihaela Van Der Schaar. Account- ing for informative sampling when learning to forecast treatment outcomes over time. In International Conference on Machine Learning, pages 34855–34874. PMLR, 2023

2023
[59]

Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020

Alessandro Gasparini, Keith R Abrams, Jessica K Barrett, Rupert W Major, Michael J Sweeting, Nigel J Brunskill, and Michael J Crowther. Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020

2020
[60]

Prediction of survival outcomes under clinical presence shift: A joint neural network architecture

Vincent Jeanselme, Glen Martin, Matthew Sperrin, Niels Peek, Brian Tom, and Jessica Barrett. Prediction of survival outcomes under clinical presence shift: A joint neural network architecture. arXiv preprint arXiv:2508.05472, 2025

work page arXiv 2025
[61]

Mind the data gap: Missingness still shapes large language model prognoses.arXiv preprint arXiv:2512.00479, 2025

Yuta Kobayashi, Vincent Jeanselme, and Shalmali Joshi. Mind the data gap: Missingness still shapes large language model prognoses.arXiv preprint arXiv:2512.00479, 2025. 13

work page arXiv 2025
[62]

MIMIC-IV Clinical Database Demo.PhysioNet, January 2023

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV Clinical Database Demo.PhysioNet, January 2023. doi: 10.13026/dp1f-ex47. URLhttps://doi.org/10.13026/dp1f-ex47. Version 2.2

work page doi:10.13026/dp1f-ex47 2023
[63]

MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023. doi: 10.13026/1n74-ne17. URLhttps://doi.org/10.13026/1n74-ne17. Version 2.2

work page doi:10.13026/1n74-ne17 2023
[64]

Leveraging large language models for multiple choice question answering

Joshua Robinson and David Wingate. Leveraging large language models for multiple choice question answering. InThe Eleventh International Conference on Learning Representations,
[65]

URLhttps://openreview.net/forum?id=yKbprarjc5B
[66]

Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

2019
[67]

Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020

Seyedmostafa Sheikhalishahi, Vevake Balaraman, and Venet Osmani. Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020

2020
[68]

Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

2020
[69]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[70]

Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

2023
[71]

Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody H Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al. Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024

2024
[72]

Mind the performance gap: examining dataset shift during prospective validation

Erkin Otles, Jeeheh Oh, Benjamin Li, Michelle Bochinski, Hyeon Joo, Justin Ortwine, Erica Shenoy, Laraine Washer, Vincent B Young, Krishna Rao, et al. Mind the performance gap: examining dataset shift during prospective validation. InMachine Learning for Healthcare Conference, pages 506–534. PMLR, 2021

2021
[73]

Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016

2016
[74]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7

2019
[75]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015. URL https://arxiv.org/ abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015
[76]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

2019
[77]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[78]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024. 14

2024
[79]

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[80]

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

Showing first 80 references.