Recognition: 3 theorem links
· Lean TheoremChronos: Learning the Language of Time Series
Pith reviewed 2026-05-13 08:21 UTC · model grok-4.3
The pith
Pretrained transformers match or beat specialized models on new time series forecasting tasks after tokenizing values into a fixed vocabulary.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. Models based on the T5 family, ranging from 20M to 710M parameters, are pretrained on a large collection of publicly available datasets augmented by Gaussian-process synthetic data. In a benchmark of 42 datasets, these models significantly outperform other methods on training-corpus data and achieve comparable or occasionally superior zero-shot performance on new datasets relative to methods trained specifically on them.
What carries the argument
Tokenization of continuous time series values into a fixed vocabulary through scaling and quantization, which converts the forecasting problem into standard language-model training with cross-entropy loss.
Load-bearing premise
Scaling and quantization into a fixed vocabulary must preserve enough information to support accurate probabilistic forecasts, and the added Gaussian-process synthetic data must improve rather than harm generalization to real-world distributions.
What would settle it
On a new real-world dataset never seen in pretraining, if a Chronos model used zero-shot produces substantially higher error than a model trained from scratch specifically on that same dataset, the claim of effective zero-shot transfer would be falsified.
read the original abstract
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Chronos, a framework that tokenizes time series values via scaling and quantization into a fixed vocabulary and trains T5-family transformer models (20M–710M parameters) with cross-entropy loss. Models are pretrained on a large collection of public datasets augmented by Gaussian-process synthetic data; a benchmark on 42 datasets shows significant outperformance versus classical and deep-learning baselines on in-distribution tasks and comparable or superior zero-shot performance on unseen datasets.
Significance. If the central performance claims hold after verification of data integrity, the work establishes that language-model pretraining can be effectively transferred to probabilistic time series forecasting, leveraging cross-domain data to simplify pipelines and improve zero-shot accuracy. The scale of the benchmark (42 datasets, multiple model sizes) and explicit comparison to both local and deep baselines are strengths; the approach also supplies a concrete, reproducible tokenization recipe that could serve as a baseline for future pretrained time-series models.
major comments (3)
- [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.
- [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.
- [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.
minor comments (3)
- Clarify the precise probabilistic metrics (CRPS, quantile loss, etc.) and their normalization; state whether they are computed on the original scale or the quantized tokens.
- In result tables, report both mean and standard deviation across random seeds or cross-validation folds for the largest Chronos model.
- Add a short paragraph comparing the chosen T5 architecture and training recipe to prior time-series transformer work (e.g., Informer, Autoformer) to highlight the novelty of the quantization-plus-LM approach.
Simulated Author's Rebuttal
Thank you for the detailed and insightful review of our paper on Chronos. We have carefully considered each of the major comments and provide point-by-point responses below. We will incorporate the suggested changes into the revised manuscript to address the concerns regarding experimental transparency, tokenization validation, and synthetic data ablation.
read point-by-point responses
-
Referee: [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.
Authors: We agree that an explicit partition is necessary to substantiate the zero-shot claims and rule out any possibility of leakage. In the revised manuscript, we will add a dedicated table in Section 4 that enumerates all 42 datasets, indicates precisely which ones were included in the pretraining corpus, and confirms that the zero-shot evaluation sets have zero overlap with the pretraining data. This will clearly separate the in-distribution and out-of-distribution results. revision: yes
-
Referee: [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.
Authors: We recognize that validating the discretization step is important for the probabilistic interpretation. While the current implementation uses a vocabulary size of 4096 with per-series min-max scaling, we will add an ablation study in the revised version. This will compare vocabulary sizes (1024, 4096, 16384) and alternative scaling rules, with specific analysis on high-dynamic-range and heavy-tailed series subsets using CRPS and coverage to confirm that the learned distributions remain faithful rather than coarsened. revision: yes
-
Referee: [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.
Authors: We agree that a controlled ablation is required to quantify the contribution of the GP synthetic data. In the revision, we will include a direct comparison of models trained with and without the GP-augmented data, reporting zero-shot CRPS and coverage metrics on the full benchmark. This will demonstrate the net effect on non-stationary real-world series and address the concern about the stationary GP prior. revision: yes
Circularity Check
No significant circularity in Chronos derivation chain
full rationale
The paper's core derivation consists of tokenizing time series via scaling and quantization into a fixed vocabulary, then training T5-based transformers with cross-entropy loss on a mix of public datasets and GP-generated synthetic data. Performance claims rest on empirical benchmarks across 42 datasets, explicitly distinguishing in-corpus results from zero-shot evaluation on held-out datasets never used in pretraining. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citations justify uniqueness or ansatzes, and no renaming of known results occurs. The framework is self-contained through standard pretraining and independent evaluation.
Axiom & Free-Parameter Ledger
free parameters (2)
- quantization vocabulary size
- per-series scaling rule
axioms (1)
- domain assumption Time series values can be losslessly represented for forecasting purposes by scaling followed by quantization into a fixed vocabulary.
Forward citations
Cited by 26 Pith papers
-
TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.
-
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...
-
FactoryBench: Evaluating Industrial Machine Understanding
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
-
Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning
xMAE pretrains biosignal representations via masked cross-modal reconstruction of temporally ordered signals like ECG and PPG, outperforming baselines on 15 of 19 downstream tasks including cardiovascular prediction a...
-
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
-
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring
A model-agnostic adaptive conformal anomaly detection approach uses weighted quantile bounds learned from past foundation model predictions to deliver interpretable p-value scores with stable calibration under shifts ...
-
TempusBench: An Evaluation Framework for Time-Series Forecasting
TempusBench is a new evaluation framework for time-series forecasting models that supplies fresh non-overlapping datasets, tasks beyond horizon and domain, consistent tuning across models, and visualization tools.
-
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
-
LoRM: Learning the Language of Rotating Machinery for Self-Supervised Condition Monitoring
LoRM is a self-supervised framework that models multi-modal rotating machinery signals as token sequences for prediction with fine-tuned language models, using prediction errors to monitor machine health in real time.
-
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
-
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...
-
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...
-
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
-
Continuity Laws for Sequential Models
S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.
-
Can Transformers predict system collapse in dynamical systems?
Transformers fail to predict catastrophic collapse in unseen parameter regimes of nonlinear dynamical systems, while reservoir computing reliably succeeds.
-
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
-
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...
-
Predicting Power-System Dynamic Trajectories with Foundation Models
LASS-ODE-Power is a pretrained model that predicts power-system dynamic trajectories across regimes in a zero-shot manner after large-scale ODE pretraining and targeted fine-tuning.
-
FM-CAC: Carbon-Aware Control for Battery-Buffered Edge AI via Time-Series Foundation Models
FM-CAC uses battery buffering and time-series foundation models for zero-shot carbon forecasting in a dynamic programming optimizer to reduce edge AI carbon emissions by up to 65.6% with near-maximum accuracy.
-
A Quantum Inspired Variational Kernel and Explainable AI Framework for Cross Region Solar and Wind Energy Forecasting
A hybrid classical-plus-quantum-inspired framework for cross-region renewable energy forecasting matches top baselines within 1% accuracy and separates calm versus stormy conditions with a 15-fold higher Fisher discri...
-
Sessa: Selective State Space Attention
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
-
Wearable AI in the Era of Large Sensor Models
Large Sensor Models trained on large-scale multimodal wearable data can provide a scalable, general framework for wearable AI by learning transferable representations across modalities and tasks.
-
Thermal-GEMs: Generalized Models for Building Thermal Dynamics
Multi-source transfer learning for building thermal dynamics yields up to 63% lower forecasting errors than single-source models and outperforms time series foundation models when pretrained on 16-32 buildings over one year.
-
Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook
The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with...
-
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications
The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.
-
Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction
Chronos encodes frequency content in decoder representations with quality that varies across the spectrum, as revealed by minimum description length probes on sinusoid inputs.
Reference graph
Works this paper leans on
-
[1]
GluonTS: Probabilistic and Neural Time Series Modeling in Python
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, et al. GluonTS: Probabilistic and Neural Time Series Modeling in Python . The Journal of Machine Learning Research, 21 0 (1): 0 4629--4634, 2020
work page 2020
-
[2]
Deep Explicit Duration Switching Models for Time Series
Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J Smola, Bernie Wang, and Tim Januschowski. Deep Explicit Duration Switching Models for Time Series . Advances in Neural Information Processing Systems, 34, 2021
work page 2021
-
[3]
Neural continuous-discrete state space models for irregularly-sampled time series
Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. In International Conference on Machine Learning, pp.\ 926--951. PMLR, 2023
work page 2023
-
[4]
V. Assimakopoulos and K. Nikolopoulos. The theta model: a decomposition approach to forecasting . International Journal of Forecasting, 16 0 (4): 0 521--530, 2000
work page 2000
-
[5]
Hyndman, Haiyan Song, and Doris C
George Athanasopoulos, Rob J. Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27 0 (3): 0 822--844, 2011
work page 2011
-
[6]
Deep learning for time series forecasting: Tutorial and literature survey
Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, Fran c ois-Xavier Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and literature survey. ACM Comput. Surv., 55 0 (6), 2022
work page 2022
-
[7]
Multi-objective model selection for time series forecasting
Oliver Borchert, David Salinas, Valentin Flunkert, Tim Januschowski, and Stephan G \"u nnemann. Multi-objective model selection for time series forecasting. arXiv preprint arXiv:2202.08485, 2022
-
[8]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...
work page 2020
-
[9]
Neural Contextual Anomaly Detection for Time Series
Chris U Carmona, Fran c ois-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. Neural Contextual Anomaly Detection for Time Series . arXiv:2107.07702, 2021
-
[10]
N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting
Cristian Challu, Kin G Olivares, Boris N Oreshkin, Federico Garza Ramirez, Max Mergenthaler Canseco, and Artur Dubrawski. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023
work page 2023
-
[11]
A neural network approach to ordinal regression
Jianlin Cheng, Zheng Wang, and Gianluca Pollastri. A neural network approach to ordinal regression. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp.\ 1279--1284. IEEE, 2008
work page 2008
-
[12]
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling Language Modeling with Pathways . Journal of Machine Learning Research, 24 0 (240): 0 1--113, 2023
work page 2023
-
[13]
Scaling Instruction-Finetuned Language Models
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling Instruction-Finetuned Language Models . arXiv:2210.11416, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[14]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning . arXiv:2307.08691, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
TSMix : time series data augmentation by mixing sources
Luke Nicholas Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, and Adam Barker. TSMix : time series data augmentation by mixing sources. In Proceedings of the 3rd Workshop on Machine Learning and Systems, pp.\ 109--114, 2023
work page 2023
-
[16]
A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. arXiv:2310.10688, 2023
-
[17]
The UCR Time Series Classification Archive , October 2018
Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. The UCR Time Series Classification Archive , October 2018. https://www.cs.ucr.edu/ eamonn/time_series_data_2018/
work page 2018
-
[18]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale . arXiv:2208.07339, 2022
work page internal anchor Pith review arXiv 2022
-
[19]
Simmtm: A simple pre-training framework for masked time-series modeling
Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling . arXiv:2302.00861, 2023
-
[20]
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, and Colin White. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting . In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[21]
Structure Discovery in Nonparametric Regression through Compositional Kernel Search
David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure Discovery in Nonparametric Regression through Compositional Kernel Search . In International Conference on Machine Learning, pp.\ 1166--1174. PMLR, 2013
work page 2013
-
[22]
Patrick Emami, Abhijeet Sahu, and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting . arXiv:2307.00142, 2023
-
[23]
Hierarchical neural story generation.CoRR, abs/1805.04833, 2018
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical Neural Story Generation . arXiv:1805.04833, 2018
-
[24]
Stop regressing: Training value functions via classification for scalable deep rl
Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta \" ga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. arXiv preprint arXiv:2403.03950, 2024
-
[25]
How not to lie with statistics: the correct way to summarize benchmark results
Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 29 0 (3): 0 218--221, 1986
work page 1986
-
[26]
Beam Search Strategies for Neural Machine Translation
Markus Freitag and Yaser Al-Onaizan. Beam Search Strategies for Neural Machine Translation . arXiv:1702.01806, 2017
-
[27]
Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023
Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023. URL https://lmsys.org/blog/2023-11-21-lookahead-decoding/
work page 2023
-
[28]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling . arXiv:2101.00027, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2020
- [29]
-
[30]
Probabilistic Forecasting with Spline Quantile Function RNNs
Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs . In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp.\ ...
work page 1901
-
[31]
Strictly proper scoring rules, prediction, and estimation
Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102 0 (477): 0 359--378, 2007
work page 2007
-
[32]
Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash Time Series Forecasting Archive . In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
work page 2021
-
[33]
Moment: A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024
-
[34]
Large Language Models Are Zero-Shot Time Series Forecasters
Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large Language Models Are Zero-Shot Time Series Forecasters . In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[35]
The Curious Case of Neural Text Degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv:1904.09751, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[36]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models . In International Conference on Learning Representations, 2022
work page 2022
-
[37]
Transformer-based deep survival analysis
Shi Hu, Egill Fridgeirsson, Guido van Wingen, and Max Welling. Transformer-based deep survival analysis. In Survival Prediction-Algorithms, Challenges and Applications, pp.\ 132--148. PMLR, 2021
work page 2021
-
[38]
Forecasting with exponential smoothing: the state space approach
Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008
work page 2008
-
[39]
Forecasting: principles and practice
Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018
work page 2018
-
[40]
Another look at measures of forecast accuracy
Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22 0 (4): 0 679--688, 2006
work page 2006
-
[41]
Deep learning for time series classification: a review
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. Deep learning for time series classification: a review. Data mining and knowledge discovery, 33 0 (4): 0 917--963, 2019
work page 2019
-
[42]
Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time- LLM : Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[43]
Domain adaptation for time series forecasting via attention sharing
Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In International Conference on Machine Learning, pp.\ 10280--10297. PMLR, 2022
work page 2022
-
[44]
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A Highly Efficient Gradient Boosting Decision Tree . Advances in neural information processing systems, 30, 2017
work page 2017
-
[45]
Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15 0 (4): 0 143--156, 2001
work page 2001
-
[46]
A classification of business forecasting problems
Stephan Kolassa and Tim Januschowski. A classification of business forecasting problems. Foresight, 52, 2019
work page 2019
-
[47]
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting . In Advances in Neural Information Processing Systems, volume 36, pp.\ 28341--28364. Curran Associates, Inc., 2023
work page 2023
-
[48]
Fast inference from transformers via speculative decoding
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pp.\ 19274--19286. PMLR, 2023
work page 2023
-
[49]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension . arXiv:1910.13461, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[50]
Temporal fusion transformers for interpretable multi-horizon time series forecasting
Bryan Lim, Sercan \"O Ar k, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021
work page 2021
-
[51]
Largest: A benchmark dataset for large-scale traffic forecasting
Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, and Roger Zimmermann. Largest: A benchmark dataset for large-scale traffic forecasting. arXiv:2306.08259, 2023
-
[52]
The M3-Competition: results, conclusions and implications
Spyros Makridakis and Michele Hibon. The M3-Competition: results, conclusions and implications . International journal of forecasting, 16 0 (4): 0 451--476, 2000
work page 2000
-
[53]
Accuracy of forecasting: An empirical investigation
Spyros Makridakis, Michele Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142 0 (2): 0 97--145, 1979
work page 1979
-
[54]
The M4 Competition: 100,000 time series and 61 forecasting methods
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 Competition: 100,000 time series and 61 forecasting methods . International Journal of Forecasting, 36 0 (1): 0 54--74, 2020
work page 2020
-
[55]
M5 accuracy competition: Results, findings, and conclusions
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competition: Results, findings, and conclusions . International Journal of Forecasting, 38 0 (4): 0 1346--1364, 2022
work page 2022
-
[56]
Regression models for ordinal data
Peter McCullagh. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42 0 (2): 0 109--127, 1980
work page 1980
-
[57]
Pointer Sentinel Mixture Models
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv:1609.07843, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[58]
Large language models as general pattern machines
Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, and Andy Zeng. Large language models as general pattern machines. In Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, pp.\ 2498--2518. PMLR, 2023
work page 2023
-
[59]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023
work page 2023
-
[60]
Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski
Kin G. Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski. NeuralForecast : User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/neuralforecast
work page 2022
-
[61]
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv:1609.03499, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[62]
Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio
Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting . In International Conference on Learning Representations, 2020
work page 2020
-
[63]
Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio
Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021
work page 2021
-
[64]
Bernardo P \' e rez Orozco and Stephen J. Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks. In 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp.\ 503--508, 2020
work page 2020
-
[65]
Learning quantile functions without quantile crossing for distribution-free time series forecasting
Youngsuk Park, Danielle Maddix, Fran c ois-Xavier Aubet, Kelvin Kan, Jan Gasthaus, and Yuyang Wang. Learning quantile functions without quantile crossing for distribution-free time series forecasting. In International Conference on Artificial Intelligence and Statistics, pp.\ 8127--8150. PMLR, 2022
work page 2022
-
[66]
A simple combination of univariate models
Fotios Petropoulos and Ivan Svetunkov. A simple combination of univariate models. International journal of forecasting, 36 0 (1): 0 110--115, 2020
work page 2020
-
[67]
The effectiveness of discretization in forecasting: An empirical study on neural time series models
Stephan Rabanser, Tim Januschowski, Valentin Flunkert, David Salinas, and Jan Gasthaus. The effectiveness of discretization in forecasting: An empirical study on neural time series models. arXiv:2005.10111, 2020
-
[68]
Language models are unsupervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019
work page 2019
-
[69]
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21 0 (1): 0 5485--5551, 2020
work page 2020
-
[70]
Integrating multimodal information in large pretrained transformers
Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2020, pp.\ 2359. NIH Public Access, 2020
work page 2020
-
[71]
Deep state space models for time series forecasting
Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018
work page 2018
-
[72]
Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp.\ 8857--8868. PMLR, 2021
work page 2021
-
[73]
Lag-llama: Towards foundation models for time series forecasting, 2023
Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models for time series forecasting, 2023
work page 2023
-
[74]
Conformalized quantile regression
Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. Advances in neural information processing systems, 32, 2019
work page 2019
-
[75]
Latent ordinary differential equations for irregularly-sampled time series
Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019
work page 2019
-
[76]
Deepar: Probabilistic forecasting with autoregressive recurrent networks
David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36 0 (3): 0 1181--1191, 2020
work page 2020
-
[77]
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv:1508.07909, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[78]
Autogluon--timeseries: Automl for probabilistic time series forecasting
Oleksandr Shchur, Ali Caner Turkmen, Nick Erickson, Huibin Shen, Alexander Shirkov, Tony Hu, and Bernie Wang. Autogluon--timeseries: Automl for probabilistic time series forecasting. In International Conference on Automated Machine Learning, pp.\ 9--1. PMLR, 2023
work page 2023
-
[79]
Conformal time-series forecasting
Kamile Stankeviciute, Ahmed M Alaa, and Mihaela van der Schaar. Conformal time-series forecasting. Advances in neural information processing systems, 34: 0 6216--6228, 2021
work page 2021
-
[80]
Regression as classification: Influence of task formulation on neural network features
Lawrence Stewart, Francis Bach, Quentin Berthet, and Jean-Philippe Vert. Regression as classification: Influence of task formulation on neural network features. In International Conference on Artificial Intelligence and Statistics, pp.\ 11563--11582. PMLR, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.