pith. machine review for the scientific record. sign in

arxiv: 2403.07815 · v3 · submitted 2024-03-12 · 💻 cs.LG · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Andrew Gordon Wilson, Caner Turkmen, Danielle C. Maddix, Hao Wang, Huibin Shen, Jasper Zschiegner, Kari Torkkola, Lorenzo Stella, Michael Bohlke-Schneider, Michael W. Mahoney, Oleksandr Shchur, Pedro Mercado, Sebastian Pineda Arango, Shubham Kapoor, Syama Sundar Rangapuram, Xiyuan Zhang, Yuyang Wang

Pith reviewed 2026-05-13 08:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords time series forecastingpretrained modelstransformerzero-shot learningprobabilistic forecastingtokenizationsynthetic dataT5
0
0 comments X

The pith

Pretrained transformers match or beat specialized models on new time series forecasting tasks after tokenizing values into a fixed vocabulary.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chronos shows that time series forecasting can be reframed as a language modeling problem by first scaling and quantizing the numerical values into discrete tokens drawn from a fixed vocabulary. Standard transformer architectures from the T5 family are then trained on this tokenized data using the cross-entropy loss, after pretraining on a large mix of public real-world datasets and additional synthetic series generated from Gaussian processes. The resulting models significantly outperform baselines on data they saw during training and deliver comparable or better zero-shot accuracy on entirely new datasets than methods trained specifically for those datasets. This matters because it offers a route to general forecasting tools that avoid building and retraining a fresh model for every new task or domain.

Core claim

Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. Models based on the T5 family, ranging from 20M to 710M parameters, are pretrained on a large collection of publicly available datasets augmented by Gaussian-process synthetic data. In a benchmark of 42 datasets, these models significantly outperform other methods on training-corpus data and achieve comparable or occasionally superior zero-shot performance on new datasets relative to methods trained specifically on them.

What carries the argument

Tokenization of continuous time series values into a fixed vocabulary through scaling and quantization, which converts the forecasting problem into standard language-model training with cross-entropy loss.

Load-bearing premise

Scaling and quantization into a fixed vocabulary must preserve enough information to support accurate probabilistic forecasts, and the added Gaussian-process synthetic data must improve rather than harm generalization to real-world distributions.

What would settle it

On a new real-world dataset never seen in pretraining, if a Chronos model used zero-shot produces substantially higher error than a model trained from scratch specifically on that same dataset, the claim of effective zero-shot transfer would be falsified.

read the original abstract

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces Chronos, a framework that tokenizes time series values via scaling and quantization into a fixed vocabulary and trains T5-family transformer models (20M–710M parameters) with cross-entropy loss. Models are pretrained on a large collection of public datasets augmented by Gaussian-process synthetic data; a benchmark on 42 datasets shows significant outperformance versus classical and deep-learning baselines on in-distribution tasks and comparable or superior zero-shot performance on unseen datasets.

Significance. If the central performance claims hold after verification of data integrity, the work establishes that language-model pretraining can be effectively transferred to probabilistic time series forecasting, leveraging cross-domain data to simplify pipelines and improve zero-shot accuracy. The scale of the benchmark (42 datasets, multiple model sizes) and explicit comparison to both local and deep baselines are strengths; the approach also supplies a concrete, reproducible tokenization recipe that could serve as a baseline for future pretrained time-series models.

major comments (3)
  1. [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.
  2. [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.
  3. [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.
minor comments (3)
  1. Clarify the precise probabilistic metrics (CRPS, quantile loss, etc.) and their normalization; state whether they are computed on the original scale or the quantized tokens.
  2. In result tables, report both mean and standard deviation across random seeds or cross-validation folds for the largest Chronos model.
  3. Add a short paragraph comparing the chosen T5 architecture and training recipe to prior time-series transformer work (e.g., Informer, Autoformer) to highlight the novelty of the quantization-plus-LM approach.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed and insightful review of our paper on Chronos. We have carefully considered each of the major comments and provide point-by-point responses below. We will incorporate the suggested changes into the revised manuscript to address the concerns regarding experimental transparency, tokenization validation, and synthetic data ablation.

read point-by-point responses
  1. Referee: [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.

    Authors: We agree that an explicit partition is necessary to substantiate the zero-shot claims and rule out any possibility of leakage. In the revised manuscript, we will add a dedicated table in Section 4 that enumerates all 42 datasets, indicates precisely which ones were included in the pretraining corpus, and confirms that the zero-shot evaluation sets have zero overlap with the pretraining data. This will clearly separate the in-distribution and out-of-distribution results. revision: yes

  2. Referee: [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.

    Authors: We recognize that validating the discretization step is important for the probabilistic interpretation. While the current implementation uses a vocabulary size of 4096 with per-series min-max scaling, we will add an ablation study in the revised version. This will compare vocabulary sizes (1024, 4096, 16384) and alternative scaling rules, with specific analysis on high-dynamic-range and heavy-tailed series subsets using CRPS and coverage to confirm that the learned distributions remain faithful rather than coarsened. revision: yes

  3. Referee: [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.

    Authors: We agree that a controlled ablation is required to quantify the contribution of the GP synthetic data. In the revision, we will include a direct comparison of models trained with and without the GP-augmented data, reporting zero-shot CRPS and coverage metrics on the full benchmark. This will demonstrate the net effect on non-stationary real-world series and address the concern about the stationary GP prior. revision: yes

Circularity Check

0 steps flagged

No significant circularity in Chronos derivation chain

full rationale

The paper's core derivation consists of tokenizing time series via scaling and quantization into a fixed vocabulary, then training T5-based transformers with cross-entropy loss on a mix of public datasets and GP-generated synthetic data. Performance claims rest on empirical benchmarks across 42 datasets, explicitly distinguishing in-corpus results from zero-shot evaluation on held-out datasets never used in pretraining. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citations justify uniqueness or ansatzes, and no renaming of known results occurs. The framework is self-contained through standard pretraining and independent evaluation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that discrete tokenization of scaled values is sufficient for probabilistic forecasting and that synthetic GP data aids transfer without introducing harmful distribution shift.

free parameters (2)
  • quantization vocabulary size
    Number of discrete tokens chosen for the fixed vocabulary; value not stated in abstract.
  • per-series scaling rule
    Method and parameters used to normalize each series before quantization; not detailed in abstract.
axioms (1)
  • domain assumption Time series values can be losslessly represented for forecasting purposes by scaling followed by quantization into a fixed vocabulary.
    This premise enables the language-model training pipeline and is invoked in the tokenization step described in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1311 out tokens · 45903 ms · 2026-05-13T08:21:55.992531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning

    cs.AI 2026-05 unverdicted novelty 7.0

    TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.

  2. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...

  3. FactoryBench: Evaluating Industrial Machine Understanding

    cs.AI 2026-05 unverdicted novelty 7.0

    FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.

  4. Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    xMAE pretrains biosignal representations via masked cross-modal reconstruction of temporally ordered signals like ECG and PPG, outperforming baselines on 15 of 19 downstream tasks including cardiovascular prediction a...

  5. Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models

    cs.LG 2026-04 unverdicted novelty 7.0

    Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.

  6. Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring

    cs.LG 2026-04 unverdicted novelty 7.0

    A model-agnostic adaptive conformal anomaly detection approach uses weighted quantile bounds learned from past foundation model predictions to deliver interpretable p-value scores with stable calibration under shifts ...

  7. TempusBench: An Evaluation Framework for Time-Series Forecasting

    cs.LG 2026-04 unverdicted novelty 7.0

    TempusBench is a new evaluation framework for time-series forecasting models that supplies fresh non-overlapping datasets, tasks beyond horizon and domain, consistent tuning across models, and visualization tools.

  8. TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

    cs.AI 2026-04 conditional novelty 7.0

    TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.

  9. LoRM: Learning the Language of Rotating Machinery for Self-Supervised Condition Monitoring

    cs.CL 2026-04 unverdicted novelty 7.0

    LoRM is a self-supervised framework that models multi-modal rotating machinery signals as token sequences for prediction with fine-tuned language models, using prediction errors to monitor machine health in real time.

  10. MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

    cs.LG 2026-05 unverdicted novelty 6.0

    MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.

  11. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...

  12. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...

  13. RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

    cs.LG 2026-05 unverdicted novelty 6.0

    RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.

  14. Continuity Laws for Sequential Models

    cs.LG 2026-05 unverdicted novelty 6.0

    S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.

  15. Can Transformers predict system collapse in dynamical systems?

    nlin.CD 2026-05 unverdicted novelty 6.0

    Transformers fail to predict catastrophic collapse in unseen parameter regimes of nonlinear dynamical systems, while reservoir computing reliably succeeds.

  16. FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

    cs.LG 2026-04 unverdicted novelty 6.0

    Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.

  17. Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

    cs.LG 2026-04 unverdicted novelty 6.0

    Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...

  18. Predicting Power-System Dynamic Trajectories with Foundation Models

    cs.AI 2026-04 unverdicted novelty 6.0

    LASS-ODE-Power is a pretrained model that predicts power-system dynamic trajectories across regimes in a zero-shot manner after large-scale ODE pretraining and targeted fine-tuning.

  19. FM-CAC: Carbon-Aware Control for Battery-Buffered Edge AI via Time-Series Foundation Models

    eess.SY 2026-04 unverdicted novelty 6.0

    FM-CAC uses battery buffering and time-series foundation models for zero-shot carbon forecasting in a dynamic programming optimizer to reduce edge AI carbon emissions by up to 65.6% with near-maximum accuracy.

  20. A Quantum Inspired Variational Kernel and Explainable AI Framework for Cross Region Solar and Wind Energy Forecasting

    cs.CL 2026-05 unverdicted novelty 5.0

    A hybrid classical-plus-quantum-inspired framework for cross-region renewable energy forecasting matches top baselines within 1% accuracy and separates calm versus stormy conditions with a 15-fold higher Fisher discri...

  21. Sessa: Selective State Space Attention

    cs.LG 2026-04 unverdicted novelty 5.0

    Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.

  22. Wearable AI in the Era of Large Sensor Models

    eess.SP 2026-04 unverdicted novelty 5.0

    Large Sensor Models trained on large-scale multimodal wearable data can provide a scalable, general framework for wearable AI by learning transferable representations across modalities and tasks.

  23. Thermal-GEMs: Generalized Models for Building Thermal Dynamics

    eess.SY 2026-04 unverdicted novelty 5.0

    Multi-source transfer learning for building thermal dynamics yields up to 63% lower forecasting errors than single-source models and outperforms time series foundation models when pretrained on 16-32 buildings over one year.

  24. Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

    eess.SP 2026-04 accept novelty 5.0

    The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with...

  25. Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications

    eess.SY 2026-04 unverdicted novelty 4.0

    The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.

  26. Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction

    cs.LG 2026-05 unverdicted novelty 3.0

    Chronos encodes frequency content in decoder representations with quality that varies across the spectrum, as revealed by minimum description length probes on sinusoid inputs.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · cited by 25 Pith papers · 11 internal anchors

  1. [1]

    GluonTS: Probabilistic and Neural Time Series Modeling in Python

    Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, et al. GluonTS: Probabilistic and Neural Time Series Modeling in Python . The Journal of Machine Learning Research, 21 0 (1): 0 4629--4634, 2020

  2. [2]

    Deep Explicit Duration Switching Models for Time Series

    Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J Smola, Bernie Wang, and Tim Januschowski. Deep Explicit Duration Switching Models for Time Series . Advances in Neural Information Processing Systems, 34, 2021

  3. [3]

    Neural continuous-discrete state space models for irregularly-sampled time series

    Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. In International Conference on Machine Learning, pp.\ 926--951. PMLR, 2023

  4. [4]

    Assimakopoulos and K

    V. Assimakopoulos and K. Nikolopoulos. The theta model: a decomposition approach to forecasting . International Journal of Forecasting, 16 0 (4): 0 521--530, 2000

  5. [5]

    Hyndman, Haiyan Song, and Doris C

    George Athanasopoulos, Rob J. Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27 0 (3): 0 822--844, 2011

  6. [6]

    Deep learning for time series forecasting: Tutorial and literature survey

    Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, Fran c ois-Xavier Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and literature survey. ACM Comput. Surv., 55 0 (6), 2022

  7. [7]

    Multi-objective model selection for time series forecasting

    Oliver Borchert, David Salinas, Valentin Flunkert, Tim Januschowski, and Stephan G \"u nnemann. Multi-objective model selection for time series forecasting. arXiv preprint arXiv:2202.08485, 2022

  8. [8]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

  9. [9]

    Neural Contextual Anomaly Detection for Time Series

    Chris U Carmona, Fran c ois-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. Neural Contextual Anomaly Detection for Time Series . arXiv:2107.07702, 2021

  10. [10]

    N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

    Cristian Challu, Kin G Olivares, Boris N Oreshkin, Federico Garza Ramirez, Max Mergenthaler Canseco, and Artur Dubrawski. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023

  11. [11]

    A neural network approach to ordinal regression

    Jianlin Cheng, Zheng Wang, and Gianluca Pollastri. A neural network approach to ordinal regression. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp.\ 1279--1284. IEEE, 2008

  12. [12]

    PaLM: Scaling Language Modeling with Pathways

    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling Language Modeling with Pathways . Journal of Machine Learning Research, 24 0 (240): 0 1--113, 2023

  13. [13]

    Scaling Instruction-Finetuned Language Models

    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling Instruction-Finetuned Language Models . arXiv:2210.11416, 2022

  14. [14]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    Tri Dao. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning . arXiv:2307.08691, 2023

  15. [15]

    TSMix : time series data augmentation by mixing sources

    Luke Nicholas Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, and Adam Barker. TSMix : time series data augmentation by mixing sources. In Proceedings of the 3rd Workshop on Machine Learning and Systems, pp.\ 109--114, 2023

  16. [16]

    A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. arXiv:2310.10688, 2023

  17. [17]

    The UCR Time Series Classification Archive , October 2018

    Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. The UCR Time Series Classification Archive , October 2018. https://www.cs.ucr.edu/ eamonn/time_series_data_2018/

  18. [18]

    LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

    Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale . arXiv:2208.07339, 2022

  19. [19]

    Simmtm: A simple pre-training framework for masked time-series modeling

    Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling . arXiv:2302.00861, 2023

  20. [20]

    ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

    Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, and Colin White. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting . In Advances in Neural Information Processing Systems, 2023

  21. [21]

    Structure Discovery in Nonparametric Regression through Compositional Kernel Search

    David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure Discovery in Nonparametric Regression through Compositional Kernel Search . In International Conference on Machine Learning, pp.\ 1166--1174. PMLR, 2013

  22. [22]

    BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

    Patrick Emami, Abhijeet Sahu, and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting . arXiv:2307.00142, 2023

  23. [23]

    Hierarchical neural story generation.CoRR, abs/1805.04833, 2018

    Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical Neural Story Generation . arXiv:1805.04833, 2018

  24. [24]

    Stop regressing: Training value functions via classification for scalable deep rl

    Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta \" ga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. arXiv preprint arXiv:2403.03950, 2024

  25. [25]

    How not to lie with statistics: the correct way to summarize benchmark results

    Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 29 0 (3): 0 218--221, 1986

  26. [26]

    Beam Search Strategies for Neural Machine Translation

    Markus Freitag and Yaser Al-Onaizan. Beam Search Strategies for Neural Machine Translation . arXiv:1702.01806, 2017

  27. [27]

    Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023

    Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023. URL https://lmsys.org/blog/2023-11-21-lookahead-decoding/

  28. [28]

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling . arXiv:2101.00027, 2020

  29. [29]

    Olivares

    Federico Garza, Max Mergenthaler Canseco, Cristian Challú, and Kin G. Olivares. StatsForecast : Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/statsforecast

  30. [30]

    Probabilistic Forecasting with Spline Quantile Function RNNs

    Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs . In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp.\ ...

  31. [31]

    Strictly proper scoring rules, prediction, and estimation

    Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102 0 (477): 0 359--378, 2007

  32. [32]

    Webb, Rob J

    Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash Time Series Forecasting Archive . In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

  33. [33]

    Moment: A family of open time-series foundation models

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024

  34. [34]

    Large Language Models Are Zero-Shot Time Series Forecasters

    Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large Language Models Are Zero-Shot Time Series Forecasters . In Advances in Neural Information Processing Systems, 2023

  35. [35]

    The Curious Case of Neural Text Degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv:1904.09751, 2019

  36. [36]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models . In International Conference on Learning Representations, 2022

  37. [37]

    Transformer-based deep survival analysis

    Shi Hu, Egill Fridgeirsson, Guido van Wingen, and Max Welling. Transformer-based deep survival analysis. In Survival Prediction-Algorithms, Challenges and Applications, pp.\ 132--148. PMLR, 2021

  38. [38]

    Forecasting with exponential smoothing: the state space approach

    Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008

  39. [39]

    Forecasting: principles and practice

    Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018

  40. [40]

    Another look at measures of forecast accuracy

    Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22 0 (4): 0 679--688, 2006

  41. [41]

    Deep learning for time series classification: a review

    Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. Deep learning for time series classification: a review. Data mining and knowledge discovery, 33 0 (4): 0 917--963, 2019

  42. [42]

    Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen

    Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time- LLM : Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations, 2024

  43. [43]

    Domain adaptation for time series forecasting via attention sharing

    Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In International Conference on Machine Learning, pp.\ 10280--10297. PMLR, 2022

  44. [44]

    LightGBM: A Highly Efficient Gradient Boosting Decision Tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A Highly Efficient Gradient Boosting Decision Tree . Advances in neural information processing systems, 30, 2017

  45. [45]

    Quantile regression

    Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15 0 (4): 0 143--156, 2001

  46. [46]

    A classification of business forecasting problems

    Stephan Kolassa and Tim Januschowski. A classification of business forecasting problems. Foresight, 52, 2019

  47. [47]

    Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

    Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting . In Advances in Neural Information Processing Systems, volume 36, pp.\ 28341--28364. Curran Associates, Inc., 2023

  48. [48]

    Fast inference from transformers via speculative decoding

    Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pp.\ 19274--19286. PMLR, 2023

  49. [49]

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension . arXiv:1910.13461, 2019

  50. [50]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting

    Bryan Lim, Sercan \"O Ar k, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021

  51. [51]

    Largest: A benchmark dataset for large-scale traffic forecasting

    Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, and Roger Zimmermann. Largest: A benchmark dataset for large-scale traffic forecasting. arXiv:2306.08259, 2023

  52. [52]

    The M3-Competition: results, conclusions and implications

    Spyros Makridakis and Michele Hibon. The M3-Competition: results, conclusions and implications . International journal of forecasting, 16 0 (4): 0 451--476, 2000

  53. [53]

    Accuracy of forecasting: An empirical investigation

    Spyros Makridakis, Michele Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142 0 (2): 0 97--145, 1979

  54. [54]

    The M4 Competition: 100,000 time series and 61 forecasting methods

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 Competition: 100,000 time series and 61 forecasting methods . International Journal of Forecasting, 36 0 (1): 0 54--74, 2020

  55. [55]

    M5 accuracy competition: Results, findings, and conclusions

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competition: Results, findings, and conclusions . International Journal of Forecasting, 38 0 (4): 0 1346--1364, 2022

  56. [56]

    Regression models for ordinal data

    Peter McCullagh. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42 0 (2): 0 109--127, 1980

  57. [57]

    Pointer Sentinel Mixture Models

    Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv:1609.07843, 2016

  58. [58]

    Large language models as general pattern machines

    Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, and Andy Zeng. Large language models as general pattern machines. In Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, pp.\ 2498--2518. PMLR, 2023

  59. [59]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023

  60. [60]

    Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski

    Kin G. Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski. NeuralForecast : User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/neuralforecast

  61. [61]

    WaveNet: A Generative Model for Raw Audio

    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv:1609.03499, 2016

  62. [62]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting . In International Conference on Learning Representations, 2020

  63. [63]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

  64. [64]

    Bernardo P \' e rez Orozco and Stephen J. Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks. In 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp.\ 503--508, 2020

  65. [65]

    Learning quantile functions without quantile crossing for distribution-free time series forecasting

    Youngsuk Park, Danielle Maddix, Fran c ois-Xavier Aubet, Kelvin Kan, Jan Gasthaus, and Yuyang Wang. Learning quantile functions without quantile crossing for distribution-free time series forecasting. In International Conference on Artificial Intelligence and Statistics, pp.\ 8127--8150. PMLR, 2022

  66. [66]

    A simple combination of univariate models

    Fotios Petropoulos and Ivan Svetunkov. A simple combination of univariate models. International journal of forecasting, 36 0 (1): 0 110--115, 2020

  67. [67]

    The effectiveness of discretization in forecasting: An empirical study on neural time series models

    Stephan Rabanser, Tim Januschowski, Valentin Flunkert, David Salinas, and Jan Gasthaus. The effectiveness of discretization in forecasting: An empirical study on neural time series models. arXiv:2005.10111, 2020

  68. [68]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

  69. [69]

    Exploring the limits of transfer learning with a unified text-to-text transformer

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21 0 (1): 0 5485--5551, 2020

  70. [70]

    Integrating multimodal information in large pretrained transformers

    Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2020, pp.\ 2359. NIH Public Access, 2020

  71. [71]

    Deep state space models for time series forecasting

    Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018

  72. [72]

    Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting

    Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp.\ 8857--8868. PMLR, 2021

  73. [73]

    Lag-llama: Towards foundation models for time series forecasting, 2023

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models for time series forecasting, 2023

  74. [74]

    Conformalized quantile regression

    Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. Advances in neural information processing systems, 32, 2019

  75. [75]

    Latent ordinary differential equations for irregularly-sampled time series

    Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019

  76. [76]

    Deepar: Probabilistic forecasting with autoregressive recurrent networks

    David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36 0 (3): 0 1181--1191, 2020

  77. [77]

    Neural Machine Translation of Rare Words with Subword Units

    Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv:1508.07909, 2015

  78. [78]

    Autogluon--timeseries: Automl for probabilistic time series forecasting

    Oleksandr Shchur, Ali Caner Turkmen, Nick Erickson, Huibin Shen, Alexander Shirkov, Tony Hu, and Bernie Wang. Autogluon--timeseries: Automl for probabilistic time series forecasting. In International Conference on Automated Machine Learning, pp.\ 9--1. PMLR, 2023

  79. [79]

    Conformal time-series forecasting

    Kamile Stankeviciute, Ahmed M Alaa, and Mihaela van der Schaar. Conformal time-series forecasting. Advances in neural information processing systems, 34: 0 6216--6228, 2021

  80. [80]

    Regression as classification: Influence of task formulation on neural network features

    Lawrence Stewart, Francis Bach, Quentin Berthet, and Jean-Philippe Vert. Regression as classification: Influence of task formulation on neural network features. In International Conference on Artificial Intelligence and Statistics, pp.\ 11563--11582. PMLR, 2023

Showing first 80 references.