Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Alexandre Drouin; Anderson Schneider; Andrew Robert Williams; Arian Khorasani; Arjun Ashok; George Adamopoulos; Hena Ghonia; Irina Rish; Kashif Rasul; Marin Bilo\v{s}

arxiv: 2310.08278 · v3 · pith:75ZQCHZCnew · submitted 2023-10-12 · 💻 cs.LG · cs.AI

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul , Arjun Ashok , Andrew Robert Williams , Hena Ghonia , Rishika Bhagwatkar , Arian Khorasani , Mohammad Javad Darvishi Bayazi , George Adamopoulos

show 10 more authors

Roland Riachi Nadhir Hassen Marin Bilo\v{s} Sahil Garg Anderson Schneider Nicolas Chapados Alexandre Drouin Valentina Zantedeschi Yuriy Nevmyvaka Irina Rish

This is my paper

classification 💻 cs.LG cs.AI

keywords foundationmodelsseriestimeforecastinglag-llamacapabilitiesdata

0 comments

read the original abstract

Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models
cs.LG 2026-05 unverdicted novelty 8.0

FactoryNet is the first universal pretraining corpus for industrial time-series data with a shared S-E-F-C schema that supports cross-embodiment transfer and competitive anomaly detection.
UC-Search: Risk-Aware Test-Time Search for Delayed Constrained Time-Series Control
cs.LG 2026-06 unverdicted novelty 7.0

UC-Search is a model-agnostic test-time wrapper that adds feasibility-automaton search and uncertainty-based risk adjustment to produce better delayed constrained control than CEM, MPPI, and risk-random baselines on p...
GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring
cs.LG 2026-05 unverdicted novelty 7.0

GlucoFM decomposes CGM traces into dual state-event streams, pretrains on 109k hours of unlabeled data, and reports superior subject-disjoint performance on seven clinical tasks across four cohorts.
SurF: A Generative Model for Multivariate Irregular Time Series Forecasting
cs.LG 2026-05 unverdicted novelty 7.0

SurF applies the Time Rescaling Theorem as a learnable bijection to create a single generative model for forecasting irregular multivariate event streams that outperforms or matches baselines on six benchmarks.
TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
cs.AI 2026-05 unverdicted novelty 7.0

TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.
FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models
cs.LG 2026-05 unverdicted novelty 7.0

FactoryNet is a 51M-point industrial time-series dataset with an S-E-F-C schema that supports zero-shot cross-embodiment transfer and competitive anomaly detection across robotic and machining tasks.
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
cs.LG 2026-04 unverdicted novelty 7.0

Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
TempusBench: An Evaluation Framework for Time-Series Forecasting
cs.LG 2026-04 unverdicted novelty 7.0

TempusBench is a new evaluation framework for time-series forecasting models that supplies fresh non-overlapping datasets, tasks beyond horizon and domain, consistent tuning across models, and visualization tools.
Sundial: A Family of Highly Capable Time Series Foundation Models
cs.LG 2025-02 conditional novelty 7.0

Sundial uses TimeFlow Loss for native pre-training of Transformers on continuous time series from TimeBench, achieving SOTA point and probabilistic forecasting with millisecond inference.
Deep Time Series Models: A Comprehensive Survey and Benchmark
cs.LG 2024-07 unverdicted novelty 7.0

This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering
cs.CL 2026-06 unverdicted novelty 6.0

CADE framework uses direct timestep embedding and supervised contrastive alignment to improve time-series question answering, reporting gains on six tasks in the Time-MQA benchmark over LLM baselines.
TimeRouter: Efficient and Adaptive Routing of Time-Series Foundation Models
cs.LG 2026-06 unverdicted novelty 6.0

TimeRouter routes among time-series foundation models via discriminative routing, selective gating and ensemble fallback, reporting SOTA LB MASE 0.6765 on GIFT-EVAL.
Conditional Imputation for Within-Modality Missingness in Multi-Modal Federated Learning
cs.LG 2026-04 unverdicted novelty 6.0

CondI applies conditional diffusion models in a two-phase federated pipeline to impute within-modality missing data, then trains extractors on the completed inputs for downstream tasks on clinical datasets.
Zeus: Towards Tuning-Free Foundation Model for Time Series Analysis
cs.LG 2026-07 unverdicted novelty 5.0

Zeus proposes a multi-scale Transformer with point-wise tokenization and Multi-Objective Temporal Masking to enable tuning-free performance on forecasting, interpolation, and other time series tasks.
Benchmarking Deep Time Series Models for Equity Portfolios
math.OC 2026-06 unverdicted novelty 5.0

Benchmark of 15 time-series architectures on equity portfolios finds no model dominates, with TransEnc-8 at 0.352 rank-1 acceptability and all promoted models showing negative net Sharpe at 20 bps costs under constraints.
PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows
cs.LG 2026-05 unverdicted novelty 5.0

PaP-NF uses prefix-as-prompt reprogramming of a frozen LLM to extract global context that conditions a normalizing flow decoder, producing probabilistic long-term time series forecasts evaluated by CRPS.
Heterogeneous Scientific Foundation Model Collaboration
cs.AI 2026-04 unverdicted novelty 5.0

Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
Thermal-GEMs: Generalized Models for Building Thermal Dynamics
eess.SY 2026-04 unverdicted novelty 5.0

Multi-source transfer learning for building thermal dynamics yields up to 63% lower forecasting errors than single-source models and outperforms time series foundation models when pretrained on 16-32 buildings over one year.
Time Series Analysis in Machine Learning
astro-ph.IM 2026-06 unverdicted novelty 1.0

A review chapter covering basic time series concepts, classical models like ARIMA, and ML approaches including neural networks and transformers.