pith. machine review for the scientific record. sign in

arxiv: 2605.13711 · v1 · submitted 2026-05-13 · 💻 cs.LG

Recognition: unknown

MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

Hsing-Huan Chung, Joydeep Ghosh, Shijun Li, Suchi Saria, Xing Han, Yoav Wald

Pith reviewed 2026-05-14 20:13 UTC · model grok-4.3

classification 💻 cs.LG
keywords multimodal irregular time serieslarge language modelselectronic health recordssampling patternstwo-stage fine-tuningXML representationhealthcare predictionirregular time series
0
0 comments X

The pith

Large language models can exploit irregular sampling patterns in multimodal time series by representing them as XML triplets and using two-stage fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that multimodal irregular time series, such as those found in electronic health records, contain predictive information not only in their measured values but also in the patterns of when and which observations are recorded. By converting these series into time-ordered triplets formatted in XML, pretrained large language models can be adapted to process the combined numerical, textual, and timing data. A two-stage fine-tuning strategy first exposes the model to value-redacted inputs so it learns from sampling patterns alone, then trains on complete data to integrate patterns with actual values. This yields the highest average performance on multiple EHR classification tasks. The approach also shows stronger relative gains in settings where some values remain unavailable at prediction time.

Core claim

MILM represents multimodal irregular time series as time-ordered triplets in XML format and fine-tunes large language models through a two-stage process. The first stage trains on value-redacted MITS to isolate learning from sampling patterns, while the second stage trains on full MITS to jointly model patterns together with observed numerical and textual content. The resulting two-stage model achieves the best average performance across EHR datasets, with value-redaction tests confirming that sampling patterns carry independent predictive signal and that the model learns to use them.

What carries the argument

The XML representation of MITS as time-ordered triplets combined with a two-stage fine-tuning process that first isolates learning from sampling patterns before integrating full observations.

If this is right

  • The two-stage model achieves the best average performance across multiple EHR datasets.
  • The single-stage counterpart ranks second best on the same tasks.
  • Value-redaction evaluations confirm that sampling patterns alone carry usable predictive signal.
  • In value-pending settings the two-stage model outperforms the direct model by a larger margin than in standard evaluation.
  • Preserving the time and channel of pending observations further improves in-hospital mortality prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same XML triplet encoding could let language models handle irregular multimodal data in non-healthcare domains without custom architectures.
  • The focus on sampling patterns points to possible uses in systems that actively decide which next measurements to request.
  • The method suggests pretrained language models can serve as a flexible base for sparse, heterogeneous observation streams.

Load-bearing premise

That representing irregular time series as XML triplets preserves enough temporal and channel structure for pretrained language models to learn from sampling patterns without major loss.

What would settle it

An experiment in which the two-stage model is retrained using plain text instead of XML triplets and its performance on EHR tasks falls to match or below standard baselines.

Figures

Figures reproduced from arXiv: 2605.13711 by Hsing-Huan Chung, Joydeep Ghosh, Shijun Li, Suchi Saria, Xing Han, Yoav Wald.

Figure 1
Figure 1. Figure 1: 1a: A patient’s lab measurement and clinical note trajectory from the first 24 hours of an [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of MILM. MILM serializes the MITS to a time-ordered XML representation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Value redaction evaluation across datasets. All models are tested with observed values [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of note character lengths across all four datasets. Dashed red and solid orange [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of pending event count (left two) and pending event rate (right two) per ICU [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of charttime for value-pending and non-pending observations. 3.3% of all lab observations. Notes are pending at a higher rate: 13.5% (IHM) and 14.0% (LOS) [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
read the original abstract

Multimodal irregular time series (MITS) consist of asynchronous and irregularly sampled observations from heterogeneous numerical and textual channels. In healthcare, for example, patients' electronic health records (EHR) include irregular lab measurements and clinical notes. The irregular timing and channel patterns of observations carry predictive signal alongside the numerical values and textual content. LLMs are natural candidates for processing such heterogeneous data, given their extensive pretrained knowledge spanning textual and numerical domains. We introduce MILM (Multimodal Irregular time series Language Model), which represents MITS as time-ordered triplets in Extensible Markup Language (XML) format and fine-tunes an LLM through a two-stage strategy for MITS classification. The first stage trains on value-redacted MITS to predict from sampling patterns alone, and the second stage trains on full MITS to jointly model sampling patterns and observed values. Our two-stage model (MILM-2S) and its single-stage counterpart (MILM-Direct) achieve the best and second-best average performance on multiple EHR datasets. Further value redaction evaluations confirm that sampling patterns carry predictive signal and that MILM-2S learns to exploit them. In the value pending evaluation we introduce, where some values are unavailable at prediction time, MILM-2S outperforms MILM-Direct by a larger margin compared to standard evaluation. For MILM-2S, preserving the time and channel of value-pending observations as additional sampling information further improves in-hospital mortality prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces MILM, a method for multimodal irregular time series (MITS) such as EHR data. It serializes observations as time-ordered XML triplets and fine-tunes LLMs via a two-stage strategy: first training on value-redacted MITS to learn from sampling patterns alone, then on full data to jointly model patterns and values. The central claims are that MILM-2S (two-stage) and MILM-Direct (single-stage) achieve the best and second-best average performance across multiple EHR datasets, that value-redaction evaluations confirm sampling patterns carry predictive signal which MILM-2S exploits, and that MILM-2S shows larger gains in a value-pending evaluation where some values are unavailable at prediction time.

Significance. If the performance claims and exploitation results hold under rigorous evaluation, the work would offer a practical way to leverage pretrained LLMs for heterogeneous irregular data in healthcare by explicitly modeling informative sampling via two-stage training and XML serialization. The value-pending evaluation protocol is a useful addition for realistic incomplete-data settings. Significance is tempered by the need for concrete numerical support and verification that the serialization preserves usable temporal structure.

major comments (3)
  1. [Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.
  2. [Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.
  3. [Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.
minor comments (2)
  1. Notation for the two variants (MILM-2S vs. MILM-Direct) should be introduced earlier and used consistently to avoid reader confusion when comparing the two-stage and single-stage results.
  2. The paper would benefit from a brief discussion of context-window limitations and how long sequences of triplets are handled, as this directly affects the temporal-structure concern.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of results, clarify methodological assumptions, and provide additional experimental details.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and results sections: the claims that MILM-2S and MILM-Direct achieve the best and second-best average performance rest on rankings across EHR datasets, yet the abstract (and by extension the reported support) provides no numerical scores, baseline details, standard deviations, statistical tests, or ablation numbers. This directly limits assessment of whether the two-stage benefit is load-bearing or merely incremental.

    Authors: We agree that the abstract would benefit from explicit numerical support. In the revised manuscript we have added the average AUROC scores (with standard deviations) for MILM-2S and MILM-Direct, the identity of the strongest baseline, and a brief statement of the two-stage improvement. The main results tables already report per-dataset scores, standard deviations, and ablation comparisons; we have now highlighted statistical significance tests (paired t-tests) in the text and caption to make the load-bearing nature of the two-stage gain clearer. revision: yes

  2. Referee: [Method (XML representation)] Method section on XML triplet serialization: the central assumption that time-ordered XML triplets allow the pretrained LLM to capture and exploit irregular sampling patterns (including precise inter-event deltas and channel information) without significant loss is not demonstrated. Standard LLM tokenization of numeric timestamps and tags can collapse fine-grained ordering and timing into generic text, which would make the reported advantage of MILM-2S over MILM-Direct illusory rather than evidence of genuine pattern exploitation.

    Authors: We acknowledge that tokenization can in principle lose precision. However, the value-redaction experiments provide direct empirical evidence that the serialized format retains usable sampling information: MILM-2S trained only on redacted XML still outperforms MILM-Direct on the same redacted inputs, and attention maps (now included in the appendix) show non-trivial attention on the explicit <time> and <channel> tags. We have added a short paragraph in Section 3.1 explaining that numeric deltas are encoded as literal strings (e.g., “delta=3.2”) and that the model is fine-tuned to treat them as distinct tokens, together with a control experiment that randomizes the order of triplets and shows a clear drop in performance. revision: partial

  3. Referee: [Experiments (value redaction / value-pending)] Value-redaction and value-pending evaluation sections: these are load-bearing for the claim that MILM-2S learns to exploit sampling patterns. Without explicit details on redaction procedure, exact performance deltas between MILM-2S and MILM-Direct, dataset statistics, or controls for context-window effects, it is unclear whether the larger margin in the value-pending setting truly reflects exploitation of preserved timing/channel information.

    Authors: We have expanded both evaluation sections. The revised text now specifies: (i) the exact redaction procedure (randomly masking 30 % of values while keeping all timestamps and channels), (ii) per-dataset AUROC deltas with standard deviations between MILM-2S and MILM-Direct, (iii) dataset statistics (number of patients, average sequence length, missingness rates), and (iv) a context-window control that truncates all sequences to the same token budget. The larger margin observed in the value-pending setting remains after these controls, supporting the claim that MILM-2S exploits the preserved sampling metadata. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on independent dataset evaluations

full rationale

The paper introduces MILM as an LLM fine-tuning approach that serializes MITS into XML triplets and uses a two-stage training procedure (value-redacted then full data). All central claims—best average performance on EHR datasets, exploitation of sampling patterns via value-redaction ablations, and gains in value-pending settings—are supported by direct empirical comparisons against baselines. No equations, derivations, or uniqueness theorems are invoked; no parameters are fitted to a subset and then relabeled as predictions; no load-bearing premises reduce to self-citations. The modeling choices (XML format, two-stage schedule) are presented as design decisions whose validity is tested externally on held-out data, keeping the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes LLMs can ingest and reason over XML-structured time-channel-value triplets without loss of irregular timing information; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption LLMs pretrained on text can process structured XML representations of time series data to learn from sampling patterns
    Invoked by the choice of XML encoding and the first training stage on value-redacted data.

pith-pipeline@v0.9.0 · 5583 in / 1213 out tokens · 38930 ms · 2026-05-14T20:13:44.741809+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 23 canonical work pages · 5 internal anchors

  1. [1]

    Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series

    Ching Chang, Jeehyun Hwang, Yidan Shi, Haixin Wang, Wei Wang, Wen-Chih Peng, and Tien- Fu Chen. Time-IMM: A dataset and benchmark for irregular multimodal multivariate time series. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. URLhttps://openreview.net/forum?id=yeqrrn51TL

  2. [2]

    Improving medical predictions by irregular multimodal electronic health records modeling

    Xinlu Zhang, Shiyang Li, Zhiyu Chen, Xifeng Yan, and Linda Ruth Petzold. Improving medical predictions by irregular multimodal electronic health records modeling. InInternational conference on machine learning, pages 41300–41313. PMLR, 2023

  3. [3]

    Multimodal language models for financial forecasting from interleaved sequences of text and time series

    Ross Koval, Nicholas Andrews, and Xifeng Yan. Multimodal language models for financial forecasting from interleaved sequences of text and time series. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 987–1001, 2025

  4. [4]

    A survey of aiops for failure management in the era of large language models.arXiv preprint arXiv:2406.11213, 2024

    Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S Yu, and Ying Li. A survey of aiops for failure management in the era of large language models.arXiv preprint arXiv:2406.11213, 2024

  5. [5]

    Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

    Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset.Scientific data, 10(1):1, 2023

  6. [6]

    MIMIC-IV.PhysioNet, October 2024

    Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV.PhysioNet, October 2024. doi: 10.13026/ kpb9-mt58. URLhttps://doi.org/10.13026/kpb9-mt58. Version 3.1

  7. [7]

    The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018

    Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eicu collaborative research database, a freely available multi-center database for critical care research.Scientific data, 5(1):180178, 2018

  8. [8]

    eICU Collaborative Research Database.PhysioNet, April 2019

    Tom Pollard, Alistair Johnson, Jesse Raffa, Leo Anthony Celi, Omar Badawi, and Roger Mark. eICU Collaborative Research Database.PhysioNet, April 2019. doi: 10.13026/C2WM1R. URL https://doi.org/10.13026/C2WM1R. Version 2.0

  9. [9]

    Using clinical notes with time series data for icu management

    Swaraj Khadanga, Karan Aggarwal, Shafiq Joty, and Jaideep Srivastava. Using clinical notes with time series data for icu management. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6432–6437, 2019

  10. [10]

    Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024

    Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. Fusemoe: Mixture-of-experts transformers for fleximodal fusion.Advances in Neural Information Processing Systems, 37: 67850–67900, 2024

  11. [11]

    Mind the missing: Variable-aware representation learning for irregular ehr time series using large language models.arXiv preprint arXiv:2509.22121, 2025

    Jeong Eul Kwon, Joo Heung Yoon, and Hyo Kyung Lee. Mind the missing: Variable-aware representation learning for irregular ehr time series using large language models.arXiv preprint arXiv:2509.22121, 2025

  12. [12]

    Unleashing the power of pre-trained language models for irregularly sampled time series

    Weijia Zhang, Chenlong Yin, Hao Liu, and Hui Xiong. Unleashing the power of pre-trained language models for irregularly sampled time series. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3831–3842, 2025

  13. [13]

    A survey on principles, models and methods for learning from irregularly sampled time series.arXiv preprint arXiv:2012.00168, 2020

    Satya Narayan Shukla and Benjamin M Marlin. A survey on principles, models and methods for learning from irregularly sampled time series.arXiv preprint arXiv:2012.00168, 2020

  14. [14]

    Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

    Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific reports, 8(1):6085, 2018

  15. [15]

    Interpolation-prediction networks for irregularly sampled time series

    Satya Narayan Shukla and Benjamin Marlin. Interpolation-prediction networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1efr3C9Ym. 10

  16. [16]

    Multi-time attention networks for irregularly sampled time series

    Satya Narayan Shukla and Benjamin Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=4c0J6lwQ4_

  17. [17]

    Adaptive time encoding for irregular multivariate time-series classification

    Sangho Lee, Kyeongseo Min, Youngdoo Son, and Hyungrok Do. Adaptive time encoding for irregular multivariate time-series classification. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  18. [18]

    Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019

    Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series.Advances in neural information processing systems, 32, 2019

  19. [19]

    Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019

    Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. Gru-ode-bayes: Continuous modeling of sporadically-observed time series.Advances in neural information processing systems, 32, 2019

  20. [20]

    Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020

    Patrick Kidger, James Morrill, James Foster, and Terry Lyons. Neural controlled differential equations for irregular time series.Advances in neural information processing systems, 33: 6696–6707, 2020

  21. [21]

    Modeling irregular time series with continuous recurrent units

    Mona Schirmer, Mazin Eltayeb, Stefan Lessmann, and Maja Rudolph. Modeling irregular time series with continuous recurrent units. InInternational conference on machine learning, pages 19388–19405. PMLR, 2022

  22. [22]

    Neural continuous-discrete state space models for irregularly-sampled time series

    Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. InInternational Conference on Machine Learning, pages 926–951. PMLR, 2023

  23. [23]

    Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023

    Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, and Dongsheng Li. Contiformer: Continuous-time transformer for irregular time series modeling.Advances in Neural Information Processing Systems, 36:47143–47175, 2023

  24. [24]

    Warpformer: A multi-scale modeling approach for irregular clinical time series

    Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, and Jia Li. Warpformer: A multi-scale modeling approach for irregular clinical time series. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3273–3285, 2023

  25. [25]

    Set functions for time series

    Max Horn, Michael Moor, Christian Bock, Bastian Rieck, and Karsten Borgwardt. Set functions for time series. InInternational Conference on Machine Learning, pages 4353–4363. PMLR, 2020

  26. [26]

    Graph-guided net- work for irregularly sampled multivariate time series

    Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, and Marinka Zitnik. Graph-guided net- work for irregularly sampled multivariate time series. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=Kwm8I7dU-l5

  27. [27]

    Irregular multivariate time series forecasting: A transformable patching graph neural networks approach

    Weijia Zhang, Chenlong Yin, Hao Liu, Xiaofang Zhou, and Hui Xiong. Irregular multivariate time series forecasting: A transformable patching graph neural networks approach. InForty-first International Conference on Machine Learning, 2024

  28. [28]

    Grafiti: Graphs for forecasting irregularly sampled time series

    Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf Scholz, Nourhan Ahmed, Johannes Burchert, Shayan Jawed, Stefan Born, and Lars Schmidt-Thieme. Grafiti: Graphs for forecasting irregularly sampled time series. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16255–16263, 2024

  29. [29]

    Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023

    Zekun Li, Shiyang Li, and Xifeng Yan. Time series as images: Vision transformer for irregularly sampled time series.Advances in Neural Information Processing Systems, 36:49187–49204, 2023

  30. [30]

    Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023

    Hao Xue and Flora D Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 36(11):6851–6864, 2023

  31. [31]

    Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023

    Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew G Wilson. Large language models are zero-shot time series forecasters.Advances in neural information processing systems, 36: 19622–19635, 2023. 11

  32. [32]

    Tempo: Prompt-based generative pre-trained transformer for time series forecasting.arXiv preprint arXiv:2310.04948, 2023

    Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. Tempo: Prompt-based generative pre-trained transformer for time series forecasting.arXiv preprint arXiv:2310.04948, 2023

  33. [33]

    One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

    Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

  34. [34]

    Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen

    Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y . Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time-LLM: Time series forecasting by reprogramming large language models. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Unb5CVPtae

  35. [35]

    Test: Text prototype aligned embedding to activate llm’s ability for time series.arXiv preprint arXiv:2308.08241, 2023

    Chenxi Sun, Hongyan Li, Yaliang Li, and Shenda Hong. Test: Text prototype aligned embedding to activate llm’s ability for time series.arXiv preprint arXiv:2308.08241, 2023

  36. [36]

    Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

    Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Autotimes: Au- toregressive time series forecasters via large language models.Advances in Neural Information Processing Systems, 37:122154–122184, 2024

  37. [37]

    S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting

    Zijie Pan, Yushan Jiang, Sahil Garg, Anderson Schneider, Yuriy Nevmyvaka, and Dongjin Song. S2ip-llm: Semantic space informed prompt learning with llm for time series forecasting. In Forty-first International Conference on Machine Learning, 2024

  38. [38]

    Calf: Aligning llms for time series forecasting via cross-modal fine-tuning

    Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, and Shu-Tao Xia. Calf: Aligning llms for time series forecasting via cross-modal fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18915–18923, 2025

  39. [39]

    Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment

    Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, and Rui Zhao. Timecma: Towards llm-empowered multivariate time series forecasting via cross-modality alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18780–18788, 2025

  40. [40]

    Multimodal llms for health grounded in individual-specific data

    Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y McLean, and Nicholas A Furlotte. Multimodal llms for health grounded in individual-specific data. InWorkshop on Machine Learning for Multimodal Healthcare Data, pages 86–102. Springer, 2023

  41. [41]

    Medtsllm: Leveraging llms for multimodal medical time series analysis.arXiv preprint arXiv:2408.07773, 2024

    Nimeesha Chan, Felix Parker, William Bennett, Tianyi Wu, Mung Yao Jia, James Fackler, and Kimia Ghobadi. Medtsllm: Leveraging llms for multimodal medical time series analysis.arXiv preprint arXiv:2408.07773, 2024

  42. [42]

    Gpt4mts: Prompt-based large language model for multimodal time-series forecasting

    Furong Jia, Kevin Wang, Yixiang Zheng, Defu Cao, and Yan Liu. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 23343–23351, 2024

  43. [43]

    Instructime: Advancing time series classification with multimodal language modeling

    Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo, and Enhong Chen. Instructime: Advancing time series classification with multimodal language modeling. InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, pages 792–800, 2025

  44. [44]

    Chattime: A unified multimodal time series foundation model bridging numerical and textual data

    Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. Chattime: A unified multimodal time series foundation model bridging numerical and textual data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 12694–12702, 2025

  45. [45]

    Timecap: Learning to contextualize, augment, and predict time series events with large language model agents

    Geon Lee, Wenchao Yu, Kijung Shin, Wei Cheng, and Haifeng Chen. Timecap: Learning to contextualize, augment, and predict time series events with large language model agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18082–18090, 2025. 12

  46. [46]

    Opentslm: Time-series language models for reasoning over multivariate medical text-and time-series data.arXiv preprint arXiv:2510.02410, 2025

    Patrick Langer, Thomas Kaar, Max Rosenblattl, Maxwell A Xu, Winnie Chow, Martin Maritsch, Robert Jakob, Ning Wang, Juncheng Liu, Aradhana Verma, et al. Opentslm: Time-series language models for reasoning over multivariate medical text-and time-series data.arXiv preprint arXiv:2510.02410, 2025

  47. [47]

    A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

  48. [48]

    Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

    Xiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, and Ming Jin. Time-moe: Billion-scale time series foundation models with mixture of experts.arXiv preprint arXiv:2409.16040, 2024

  49. [49]

    Chronos: Learning the Language of Time Series

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

  50. [50]

    Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

    Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

  51. [51]

    Tpp-llm: Modeling temporal point processes by efficiently fine-tuning large language models.arXiv preprint arXiv:2410.02062, 2024

    Zefang Liu and Yinzhu Quan. Tpp-llm: Modeling temporal point processes by efficiently fine-tuning large language models.arXiv preprint arXiv:2410.02062, 2024

  52. [52]

    Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025

    Shubham Gupta, Thibaut Durand, Graham Taylor, et al. Last stop for modeling asynchronous time series.arXiv preprint arXiv:2502.01922, 2025

  53. [53]

    Byte-token enhanced language models for temporal point processes analysis

    Quyu Kong, Yixuan Zhang, Yang Liu, Panrong Tong, Enqi Liu, and Feng Zhou. Byte-token enhanced language models for temporal point processes analysis. InProceedings of the ACM Web Conference 2026, pages 7013–7023, 2026

  54. [54]

    Rose Sisk, Lijing Lin, Matthew Sperrin, Jessica K Barrett, Brian Tom, Karla Diaz-Ordaz, Niels Peek, and Glen P Martin. Informative presence and observation in routine health data: a review of methodology for clinical risk prediction.Journal of the American Medical Informatics Association, 28(1):155–166, 2021

  55. [55]

    Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023

    Amelia LM Tan, Emily J Getzen, Meghan R Hutch, Zachary H Strasser, Alba Gutiérrez- Sacristán, Trang T Le, Arianna Dagliati, Michele Morris, David A Hanauer, Bertrand Moal, et al. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record?Journal of biomedical informatics, 139:104306, 2023

  56. [56]

    John Wiley & Sons, 2019

    Roderick JA Little and Donald B Rubin.Statistical analysis with missing data. John Wiley & Sons, 2019

  57. [57]

    Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004

    Haiqun Lin, Daniel O Scharfstein, and Robert A Rosenheck. Analysis of longitudinal data with irregular, outcome-dependent follow-up.Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3):791–813, 2004

  58. [58]

    Account- ing for informative sampling when learning to forecast treatment outcomes over time

    Toon Vanderschueren, Alicia Curth, Wouter Verbeke, and Mihaela Van Der Schaar. Account- ing for informative sampling when learning to forecast treatment outcomes over time. In International Conference on Machine Learning, pages 34855–34874. PMLR, 2023

  59. [59]

    Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020

    Alessandro Gasparini, Keith R Abrams, Jessica K Barrett, Rupert W Major, Michael J Sweeting, Nigel J Brunskill, and Michael J Crowther. Mixed-effects models for health care longitudinal data with an informative visiting process: A monte carlo simulation study.Statistica Neerlandica, 74(1):5–23, 2020

  60. [60]

    Prediction of survival outcomes under clinical presence shift: A joint neural network architecture

    Vincent Jeanselme, Glen Martin, Matthew Sperrin, Niels Peek, Brian Tom, and Jessica Barrett. Prediction of survival outcomes under clinical presence shift: A joint neural network architecture. arXiv preprint arXiv:2508.05472, 2025

  61. [61]

    Mind the data gap: Missingness still shapes large language model prognoses.arXiv preprint arXiv:2512.00479, 2025

    Yuta Kobayashi, Vincent Jeanselme, and Shalmali Joshi. Mind the data gap: Missingness still shapes large language model prognoses.arXiv preprint arXiv:2512.00479, 2025. 13

  62. [62]

    MIMIC-IV Clinical Database Demo.PhysioNet, January 2023

    Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV Clinical Database Demo.PhysioNet, January 2023. doi: 10.13026/dp1f-ex47. URLhttps://doi.org/10.13026/dp1f-ex47. Version 2.2

  63. [63]

    MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023

    Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV- Note: Deidentified free-text clinical notes.PhysioNet, January 2023. doi: 10.13026/1n74-ne17. URLhttps://doi.org/10.13026/1n74-ne17. Version 2.2

  64. [64]

    Leveraging large language models for multiple choice question answering

    Joshua Robinson and David Wingate. Leveraging large language models for multiple choice question answering. InThe Eleventh International Conference on Learning Representations,

  65. [65]

    URLhttps://openreview.net/forum?id=yKbprarjc5B

  66. [66]

    Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

    Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data.Scientific data, 6(1):96, 2019

  67. [67]

    Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020

    Seyedmostafa Sheikhalishahi, Vevake Balaraman, and Venet Osmani. Benchmarking machine learning models on multi-centre eicu critical care dataset.Plos one, 15(7):e0235424, 2020

  68. [68]

    Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

    Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

  69. [69]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  70. [70]

    Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

  71. [71]

    Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024

    Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody H Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al. Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024

  72. [72]

    Mind the performance gap: examining dataset shift during prospective validation

    Erkin Otles, Jeeheh Oh, Benjamin Li, Michelle Bochinski, Hyeon Joo, Justin Ortwine, Erica Shenoy, Laraine Washer, Vincent B Young, Krishna Rao, et al. Mind the performance gap: examining dataset shift during prospective validation. InMachine Learning for Healthcare Conference, pages 506–534. PMLR, 2021

  73. [73]

    Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016

    Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism.Advances in neural information processing systems, 29, 2016

  74. [74]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7

  75. [75]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015. URL https://arxiv.org/ abs/1412.6980

  76. [76]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  77. [77]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  78. [78]

    Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024. 14

  79. [79]

    Time2Vec: Learning a Vector Representation of Time

    Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

  80. [80]

    MedGemma Technical Report

    Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025

Showing first 80 references.