pith. sign in

arxiv: 2403.07815 · v3 · submitted 2024-03-12 · 💻 cs.LG · cs.AI

Chronos: Learning the Language of Time Series

Pith reviewed 2026-05-13 08:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords time series forecastingpretrained modelstransformerzero-shot learningprobabilistic forecastingtokenizationsynthetic dataT5
0
0 comments X

The pith

Pretrained transformers match or beat specialized models on new time series forecasting tasks after tokenizing values into a fixed vocabulary.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chronos shows that time series forecasting can be reframed as a language modeling problem by first scaling and quantizing the numerical values into discrete tokens drawn from a fixed vocabulary. Standard transformer architectures from the T5 family are then trained on this tokenized data using the cross-entropy loss, after pretraining on a large mix of public real-world datasets and additional synthetic series generated from Gaussian processes. The resulting models significantly outperform baselines on data they saw during training and deliver comparable or better zero-shot accuracy on entirely new datasets than methods trained specifically for those datasets. This matters because it offers a route to general forecasting tools that avoid building and retraining a fresh model for every new task or domain.

Core claim

Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. Models based on the T5 family, ranging from 20M to 710M parameters, are pretrained on a large collection of publicly available datasets augmented by Gaussian-process synthetic data. In a benchmark of 42 datasets, these models significantly outperform other methods on training-corpus data and achieve comparable or occasionally superior zero-shot performance on new datasets relative to methods trained specifically on them.

What carries the argument

Tokenization of continuous time series values into a fixed vocabulary through scaling and quantization, which converts the forecasting problem into standard language-model training with cross-entropy loss.

Load-bearing premise

Scaling and quantization into a fixed vocabulary must preserve enough information to support accurate probabilistic forecasts, and the added Gaussian-process synthetic data must improve rather than harm generalization to real-world distributions.

What would settle it

On a new real-world dataset never seen in pretraining, if a Chronos model used zero-shot produces substantially higher error than a model trained from scratch specifically on that same dataset, the claim of effective zero-shot transfer would be falsified.

read the original abstract

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces Chronos, a framework that tokenizes time series values via scaling and quantization into a fixed vocabulary and trains T5-family transformer models (20M–710M parameters) with cross-entropy loss. Models are pretrained on a large collection of public datasets augmented by Gaussian-process synthetic data; a benchmark on 42 datasets shows significant outperformance versus classical and deep-learning baselines on in-distribution tasks and comparable or superior zero-shot performance on unseen datasets.

Significance. If the central performance claims hold after verification of data integrity, the work establishes that language-model pretraining can be effectively transferred to probabilistic time series forecasting, leveraging cross-domain data to simplify pipelines and improve zero-shot accuracy. The scale of the benchmark (42 datasets, multiple model sizes) and explicit comparison to both local and deep baselines are strengths; the approach also supplies a concrete, reproducible tokenization recipe that could serve as a baseline for future pretrained time-series models.

major comments (3)
  1. [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.
  2. [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.
  3. [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.
minor comments (3)
  1. Clarify the precise probabilistic metrics (CRPS, quantile loss, etc.) and their normalization; state whether they are computed on the original scale or the quantized tokens.
  2. In result tables, report both mean and standard deviation across random seeds or cross-validation folds for the largest Chronos model.
  3. Add a short paragraph comparing the chosen T5 architecture and training recipe to prior time-series transformer work (e.g., Informer, Autoformer) to highlight the novelty of the quantization-plus-LM approach.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed and insightful review of our paper on Chronos. We have carefully considered each of the major comments and provide point-by-point responses below. We will incorporate the suggested changes into the revised manuscript to address the concerns regarding experimental transparency, tokenization validation, and synthetic data ablation.

read point-by-point responses
  1. Referee: [§4] Experimental setup (likely §4): the manuscript must explicitly list which of the 42 benchmark datasets appear in the pretraining corpus and confirm zero overlap with the zero-shot test sets. Without this partition table, leakage cannot be ruled out and the zero-shot superiority claim cannot be evaluated.

    Authors: We agree that an explicit partition is necessary to substantiate the zero-shot claims and rule out any possibility of leakage. In the revised manuscript, we will add a dedicated table in Section 4 that enumerates all 42 datasets, indicates precisely which ones were included in the pretraining corpus, and confirms that the zero-shot evaluation sets have zero overlap with the pretraining data. This will clearly separate the in-distribution and out-of-distribution results. revision: yes

  2. Referee: [§3.1] Tokenization procedure (likely §3.1–3.2): scaling followed by fixed-vocabulary quantization is load-bearing for the probabilistic claim. The paper should report an ablation on vocabulary size (and the per-series scaling rule) showing that discretization does not systematically bias the predictive distribution on high-dynamic-range or heavy-tailed series; otherwise the cross-entropy training may be optimizing a coarsened rather than faithful likelihood.

    Authors: We recognize that validating the discretization step is important for the probabilistic interpretation. While the current implementation uses a vocabulary size of 4096 with per-series min-max scaling, we will add an ablation study in the revised version. This will compare vocabulary sizes (1024, 4096, 16384) and alternative scaling rules, with specific analysis on high-dynamic-range and heavy-tailed series subsets using CRPS and coverage to confirm that the learned distributions remain faithful rather than coarsened. revision: yes

  3. Referee: [§4.3] Synthetic-data ablation (likely §4.3): the Gaussian-process augmentation is presented as improving generalization, yet no controlled comparison (with vs. without GP data) on zero-shot CRPS or coverage is supplied. Because the GP prior is stationary and smooth, its net effect on real-world non-stationary series must be demonstrated rather than assumed.

    Authors: We agree that a controlled ablation is required to quantify the contribution of the GP synthetic data. In the revision, we will include a direct comparison of models trained with and without the GP-augmented data, reporting zero-shot CRPS and coverage metrics on the full benchmark. This will demonstrate the net effect on non-stationary real-world series and address the concern about the stationary GP prior. revision: yes

Circularity Check

0 steps flagged

No significant circularity in Chronos derivation chain

full rationale

The paper's core derivation consists of tokenizing time series via scaling and quantization into a fixed vocabulary, then training T5-based transformers with cross-entropy loss on a mix of public datasets and GP-generated synthetic data. Performance claims rest on empirical benchmarks across 42 datasets, explicitly distinguishing in-corpus results from zero-shot evaluation on held-out datasets never used in pretraining. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citations justify uniqueness or ansatzes, and no renaming of known results occurs. The framework is self-contained through standard pretraining and independent evaluation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that discrete tokenization of scaled values is sufficient for probabilistic forecasting and that synthetic GP data aids transfer without introducing harmful distribution shift.

free parameters (2)
  • quantization vocabulary size
    Number of discrete tokens chosen for the fixed vocabulary; value not stated in abstract.
  • per-series scaling rule
    Method and parameters used to normalize each series before quantization; not detailed in abstract.
axioms (1)
  • domain assumption Time series values can be losslessly represented for forecasting purposes by scaling followed by quantization into a fixed vocabulary.
    This premise enables the language-model training pipeline and is invoked in the tokenization step described in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1311 out tokens · 45903 ms · 2026-05-13T08:21:55.992531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

    cs.AI 2026-06 unverdicted novelty 7.0

    A three-phase DRL framework for personalized portfolio management using a ticker-free encoder pretrained with a time series foundation model, an objective-conditioned MoE actor-critic, and inference-time LoRA adaptati...

  2. The Simulacrum: Decision-Theoretic Pretraining for Near-Optimal Time-Series Forecasting and Inference

    cs.LG 2026-06 unverdicted novelty 7.0

    Neural networks pretrained via stratified simulations from user-specified generative models approximate optimal decision rules for time-series forecasting and inference, outperforming MLE and AICc on structural models...

  3. UC-Search: Risk-Aware Test-Time Search for Delayed Constrained Time-Series Control

    cs.LG 2026-06 unverdicted novelty 7.0

    UC-Search is a model-agnostic test-time wrapper that adds feasibility-automaton search and uncertainty-based risk adjustment to produce better delayed constrained control than CEM, MPPI, and risk-random baselines on p...

  4. Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

    stat.ML 2026-06 unverdicted novelty 7.0

    Decision-calibrated conformal uncertainty for pacing uses the support function of the signed policy sensitivity set to achieve smaller uncertainty radii on public datasets.

  5. GNSS-FM: A Self-Supervised Foundation Model for Daily GNSS Displacement Time Series

    physics.geo-ph 2026-06 unverdicted novelty 7.0

    GNSS-FM is a self-supervised foundation model for GNSS displacement time series that outperforms task-specific baselines on 90-day forecasting and seismic step localization after pretraining on global station data.

  6. VZCrash: A Large-Scale IMU Dataset of Ego-Vehicle Crashes

    cs.CV 2026-06 unverdicted novelty 7.0

    Introduces VZCrash, the largest public IMU dataset for ego-vehicle crashes, and shows through benchmarks that larger data scale improves crash detection models especially for real-world deployment.

  7. Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty

    cs.LG 2026-06 conditional novelty 7.0

    MSE-optimal multi-step forecasters cannot match the marginal distribution of realizations under nonzero conditional uncertainty, creating a quantifiable accuracy-realism Pareto frontier across benchmarks.

  8. Olivia: Harmonizing Time Series Foundation Models with Power Spectral Density

    cs.LG 2026-05 unverdicted novelty 7.0

    Olivia harmonizes time series datasets via normalized power spectral density using a Harmonizer module and resonator-based HarmonicAttention, achieving state-of-the-art zero-shot, few-shot, and full-shot forecasting o...

  9. NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces

    cs.LG 2026-05 unverdicted novelty 7.0

    NeuroAtlas benchmarks foundation models on 42 EEG datasets and reports that EEG-specific models do not consistently outperform generic time-series models, standard metrics miss clinical utility, and rankings vary by domain.

  10. SurF: A Generative Model for Multivariate Irregular Time Series Forecasting

    cs.LG 2026-05 unverdicted novelty 7.0

    SurF applies the Time Rescaling Theorem as a learnable bijection to create a single generative model for forecasting irregular multivariate event streams that outperforms or matches baselines on six benchmarks.

  11. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 7.0

    HEPA pretrains via horizon-conditioned JEPA on unlabeled data then fine-tunes only the predictor for event survival CDFs, outperforming PatchTST, iTransformer, MAE and Chronos-2 on at least 10 of 14 benchmarks with fi...

  12. TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning

    cs.AI 2026-05 unverdicted novelty 7.0

    TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.

  13. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...

  14. FactoryBench: Evaluating Industrial Machine Understanding

    cs.AI 2026-05 unverdicted novelty 7.0

    FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.

  15. Physiology-Aware Masked Cross-Modal Reconstruction for Biosignal Representation Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    xMAE pretrains biosignal representations via masked cross-modal reconstruction of temporally ordered signals like ECG and PPG, outperforming baselines on 15 of 19 downstream tasks including cardiovascular prediction a...

  16. Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models

    cs.LG 2026-04 unverdicted novelty 7.0

    Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.

  17. Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring

    cs.LG 2026-04 unverdicted novelty 7.0

    A model-agnostic adaptive conformal anomaly detection approach uses weighted quantile bounds learned from past foundation model predictions to deliver interpretable p-value scores with stable calibration under shifts ...

  18. TempusBench: An Evaluation Framework for Time-Series Forecasting

    cs.LG 2026-04 unverdicted novelty 7.0

    TempusBench is a new evaluation framework for time-series forecasting models that supplies fresh non-overlapping datasets, tasks beyond horizon and domain, consistent tuning across models, and visualization tools.

  19. TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

    cs.AI 2026-04 conditional novelty 7.0

    TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.

  20. LoRM: Learning the Language of Rotating Machinery for Self-Supervised Condition Monitoring

    cs.CL 2026-04 unverdicted novelty 7.0

    LoRM is a self-supervised framework that models multi-modal rotating machinery signals as token sequences for prediction with fine-tuned language models, using prediction errors to monitor machine health in real time.

  21. Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

    cs.LG 2025-09 unverdicted novelty 7.0

    Super-Linear introduces a pretrained MoE architecture using frequency-specialized linear experts and spectral gating for efficient general time series forecasting.

  22. Sundial: A Family of Highly Capable Time Series Foundation Models

    cs.LG 2025-02 conditional novelty 7.0

    Sundial uses TimeFlow Loss for native pre-training of Transformers on continuous time series from TimeBench, achieving SOTA point and probabilistic forecasting with millisecond inference.

  23. TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

    cs.LG 2024-10 unverdicted novelty 7.0

    TS-Reasoner is a domain-oriented agent using LLMs, computational tools, and error feedback for multi-step time series inference, showing better performance than general LLMs on understanding and reasoning benchmarks.

  24. Deep Time Series Models: A Comprehensive Survey and Benchmark

    cs.LG 2024-07 unverdicted novelty 7.0

    This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.

  25. Probabilistic Low-Voltage Peak Load Forecasting with Time Series Foundation Models Evaluated on Application-Oriented Metrics

    cs.LG 2026-07 unverdicted novelty 6.0

    Compares foundation models for probabilistic low-voltage load forecasting on 200 real feeders and introduces a grid-planning metric that scores peak prediction by its effect on asset cost-risk decisions.

  26. Aionoscope: Debugging Latent-State Accessibility in Time-Series Representations

    cs.LG 2026-07 unverdicted novelty 6.0

    Aionoscope shows that time-series representations recover coarse signal types reliably but expose dense latent states like phase and amplitude much less reliably, with best dense-probe R² at 0.689 versus oracle 0.999.

  27. Domain-Informed Multi-View Self-Distillation for Astronomical Light-Curve Representation Learning with JEPA

    astro-ph.IM 2026-06 unverdicted novelty 6.0

    A JEPA-based model with domain-informed multi-view self-distillation learns light-curve representations that outperform hand-crafted features on 15 of 16 StarEmbed metrics and adapts competitively to other irregular t...

  28. MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

    cs.AI 2026-06 unverdicted novelty 6.0

    MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.

  29. When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting

    cs.LG 2026-06 unverdicted novelty 6.0

    GUARD distills from multiple misaligned time-series foundation models into lightweight forecasters via instance-wise routing and uncertainty-gated temperature, reducing RMSE on meteorology, carbon flux, soil moisture,...

  30. TimeRouter: Efficient and Adaptive Routing of Time-Series Foundation Models

    cs.LG 2026-06 unverdicted novelty 6.0

    TimeRouter routes among time-series foundation models via discriminative routing, selective gating and ensemble fallback, reporting SOTA LB MASE 0.6765 on GIFT-EVAL.

  31. LakeFM: Toward a Foundation Model for Aquatic Ecosystems Using Irregular Multivariate Multi-depth Time Series Data

    cs.LG 2026-06 unverdicted novelty 6.0

    LakeFM pre-trains on large ecological datasets to forecast irregular lake time series and reports competitive or superior performance with physically plausible outputs.

  32. Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

    cs.LG 2026-06 unverdicted novelty 6.0

    Tyan-WP is a pretrained wind power foundation model that outperforms site-specific TSMs and generic LTSMs in zero-shot ultra-short-term probabilistic forecasting on U.S. and U.K. sites via static embeddings and PAMF module.

  33. Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

    cs.LG 2026-06 unverdicted novelty 6.0

    SFF smooths the non-convex loss landscape of pre-trained LTSMs by linear weight interpolation with a random model, enabling more effective fine-tuning while preserving pre-trained knowledge.

  34. GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks

    cs.LG 2026-06 unverdicted novelty 6.0

    GeoGNN is a two-tower GNN that learns geographic cell embeddings from adjacency graphs and matches them to temporal representations via dot-product similarity plus classification, improving geolocalization accuracy by...

  35. EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

    cs.AI 2026-06 unverdicted novelty 6.0

    EpiEvolve achieves 0.629 accuracy in streaming COVID-19 forecasting by using episodic memory, reflection on delayed labels, and regime-aware retrieval, outperforming static LLMs (0.561) and CDC ensembles (0.325) while...

  36. GITCO: Gated Inference-Time Context Optimization in TSFMs

    cs.AI 2026-06 unverdicted novelty 6.0

    GITCO delivers +1.95% average MASE reduction on TimesFM 2.5 across 53 datasets by gated inference-time suppression of anomalous patches, capturing 89.9% of the improvement upper bound.

  37. REGAIN: REconciliation GAIN-driven Auxiliary Direction Learning

    stat.ML 2026-06 unverdicted novelty 6.0

    REGAIN learns auxiliary directions that improve reconciled forecasts by optimizing their downstream effect on loss reduction rather than variance or predictability.

  38. Stationarity-Aware Retrieval-Augmented Time Series Forecasting

    cs.LG 2026-06 unverdicted novelty 6.0

    SARAF is a new retrieval-augmented framework for time series forecasting that uses temporal similarity followed by stationarity-modulated diversity selection and aggregation to improve accuracy under non-stationarity.

  39. Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures

    astro-ph.IM 2026-06 unverdicted novelty 6.0

    SeismoGPT is a transformer autoregressive model achieving median normalized cross-correlation above 0.93 when forecasting synthetic three-component seismograms up to 240 s ahead from P- and S-wave context.

  40. InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate

    cs.LG 2026-05 unverdicted novelty 6.0

    InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.

  41. Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP

    astro-ph.IM 2026-05 unverdicted novelty 6.0

    Attentive Neural Processes outperform Gaussian Processes and neural networks on light curve interpolation quality, feature recovery, calibration, and speed for 15 transient classes under realistic Rubin cadences.

  42. AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

    cs.LG 2026-05 unverdicted novelty 6.0

    AME-TS is a structure-guided sparse MoE foundation model for time series that aligns expert routing with series-level temporal descriptors to achieve strong accuracy-efficiency tradeoffs on GIFT-Eval while improving s...

  43. Toto 2.0: Time Series Forecasting Enters the Scaling Era

    cs.LG 2026-05 unverdicted novelty 6.0

    Toto 2.0 is a family of open time series foundation models that demonstrates reliable scaling and sets new state-of-the-art results on three forecasting benchmarks.

  44. CTF4Nuclear: Common Task Framework for Nuclear Fission and Fusion Models

    cs.LG 2026-05 unverdicted novelty 6.0

    CTF4Nuclear proposes a common task framework for benchmarking ML methods on nuclear engineering datasets using 12 metrics and a new sparse-measurement system monitoring paradigm.

  45. MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

    cs.LG 2026-05 unverdicted novelty 6.0

    MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.

  46. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...

  47. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

    cs.LG 2026-05 unverdicted novelty 6.0

    HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...

  48. Robust Basis Spline Decoupling for the Compression of Transformer Models

    cs.LG 2026-05 unverdicted novelty 6.0

    The paper proposes a robust B-spline decoupling framework using constrained coupled matrix-tensor factorization and the R-CMTF-BSD algorithm for compressing Vision and Swin Transformer models.

  49. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.

  50. RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

    cs.LG 2026-05 unverdicted novelty 6.0

    RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.

  51. Continuity Laws for Sequential Models

    cs.LG 2026-05 unverdicted novelty 6.0

    S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.

  52. PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift

    cs.LG 2026-05 unverdicted novelty 6.0

    PIMSM is a Mamba-based architecture that maps knee frequencies from spectra to multi-scale discretization parameters to reduce representation drift under distribution shifts in fMRI and weather forecasting.

  53. Can Transformers predict system collapse in dynamical systems?

    nlin.CD 2026-05 unverdicted novelty 6.0

    Transformers fail to predict catastrophic collapse in unseen parameter regimes of nonlinear dynamical systems, while reservoir computing reliably succeeds.

  54. FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

    cs.LG 2026-04 unverdicted novelty 6.0

    Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.

  55. Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

    cs.LG 2026-04 unverdicted novelty 6.0

    Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-devi...

  56. Predicting Power-System Dynamic Trajectories with Foundation Models

    cs.AI 2026-04 unverdicted novelty 6.0

    LASS-ODE-Power is a pretrained model that predicts power-system dynamic trajectories across regimes in a zero-shot manner after large-scale ODE pretraining and targeted fine-tuning.

  57. FM-CAC: Carbon-Aware Control for Battery-Buffered Edge AI via Time-Series Foundation Models

    eess.SY 2026-04 unverdicted novelty 6.0

    FM-CAC uses battery buffering and time-series foundation models for zero-shot carbon forecasting in a dynamic programming optimizer to reduce edge AI carbon emissions by up to 65.6% with near-maximum accuracy.

  58. A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

    cs.LG 2026-03 unverdicted novelty 6.0

    iAmTime is a time-series foundation model that uses instruction-conditioned in-context learning from demonstrations to perform zero-shot adaptation on forecasting, imputation, classification, and related tasks.

  59. Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

    cs.AI 2026-03 unverdicted novelty 6.0

    Timer-S1 is a released 8.3B-parameter MoE time series model that achieves state-of-the-art MASE and CRPS scores on GIFT-Eval using serial scaling and Serial-Token Prediction.

  60. BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences

    cs.LG 2025-12 unverdicted novelty 6.0

    BERTO introduces a prompt-conditioned BERT framework for cellular traffic forecasting that uses a balancing loss to enable flexible trade-offs between power consumption and SLA violations using natural language inputs.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · cited by 89 Pith papers · 12 internal anchors

  1. [1]

    GluonTS: Probabilistic and Neural Time Series Modeling in Python

    Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, et al. GluonTS: Probabilistic and Neural Time Series Modeling in Python . The Journal of Machine Learning Research, 21 0 (1): 0 4629--4634, 2020

  2. [2]

    Deep Explicit Duration Switching Models for Time Series

    Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J Smola, Bernie Wang, and Tim Januschowski. Deep Explicit Duration Switching Models for Time Series . Advances in Neural Information Processing Systems, 34, 2021

  3. [3]

    Neural continuous-discrete state space models for irregularly-sampled time series

    Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh. Neural continuous-discrete state space models for irregularly-sampled time series. In International Conference on Machine Learning, pp.\ 926--951. PMLR, 2023

  4. [4]

    Assimakopoulos and K

    V. Assimakopoulos and K. Nikolopoulos. The theta model: a decomposition approach to forecasting . International Journal of Forecasting, 16 0 (4): 0 521--530, 2000

  5. [5]

    Hyndman, Haiyan Song, and Doris C

    George Athanasopoulos, Rob J. Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27 0 (3): 0 822--844, 2011

  6. [6]

    Deep learning for time series forecasting: Tutorial and literature survey

    Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, Fran c ois-Xavier Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and literature survey. ACM Comput. Surv., 55 0 (6), 2022

  7. [7]

    Multi-objective model selection for time series forecasting

    Oliver Borchert, David Salinas, Valentin Flunkert, Tim Januschowski, and Stephan G \"u nnemann. Multi-objective model selection for time series forecasting. arXiv preprint arXiv:2202.08485, 2022

  8. [8]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

  9. [9]

    Neural Contextual Anomaly Detection for Time Series

    Chris U Carmona, Fran c ois-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. Neural Contextual Anomaly Detection for Time Series . arXiv:2107.07702, 2021

  10. [10]

    N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

    Cristian Challu, Kin G Olivares, Boris N Oreshkin, Federico Garza Ramirez, Max Mergenthaler Canseco, and Artur Dubrawski. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023

  11. [11]

    A neural network approach to ordinal regression

    Jianlin Cheng, Zheng Wang, and Gianluca Pollastri. A neural network approach to ordinal regression. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp.\ 1279--1284. IEEE, 2008

  12. [12]

    PaLM: Scaling Language Modeling with Pathways

    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling Language Modeling with Pathways . Journal of Machine Learning Research, 24 0 (240): 0 1--113, 2023

  13. [13]

    Scaling Instruction-Finetuned Language Models

    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling Instruction-Finetuned Language Models . arXiv:2210.11416, 2022

  14. [14]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    Tri Dao. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning . arXiv:2307.08691, 2023

  15. [15]

    TSMix : time series data augmentation by mixing sources

    Luke Nicholas Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, and Adam Barker. TSMix : time series data augmentation by mixing sources. In Proceedings of the 3rd Workshop on Machine Learning and Systems, pp.\ 109--114, 2023

  16. [16]

    A decoder-only foundation model for time-series forecasting

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. arXiv:2310.10688, 2023

  17. [17]

    The UCR Time Series Classification Archive , October 2018

    Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. The UCR Time Series Classification Archive , October 2018. https://www.cs.ucr.edu/ eamonn/time_series_data_2018/

  18. [18]

    LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

    Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale . arXiv:2208.07339, 2022

  19. [19]

    arXiv preprint arXiv:2302.00861 , year=

    Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling . arXiv:2302.00861, 2023

  20. [20]

    ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

    Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, and Colin White. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting . In Advances in Neural Information Processing Systems, 2023

  21. [21]

    Structure Discovery in Nonparametric Regression through Compositional Kernel Search

    David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure Discovery in Nonparametric Regression through Compositional Kernel Search . In International Conference on Machine Learning, pp.\ 1166--1174. PMLR, 2013

  22. [22]

    BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

    Patrick Emami, Abhijeet Sahu, and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting . arXiv:2307.00142, 2023

  23. [23]

    Hierarchical Neural Story Generation

    Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical Neural Story Generation . arXiv:1805.04833, 2018

  24. [24]

    Stop regressing: Training value functions via classification for scalable deep rl,

    Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta \" ga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, et al. Stop regressing: Training value functions via classification for scalable deep rl. arXiv preprint arXiv:2403.03950, 2024

  25. [25]

    How not to lie with statistics: the correct way to summarize benchmark results

    Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM, 29 0 (3): 0 218--221, 1986

  26. [26]

    Beam Search Strategies for Neural Machine Translation

    Markus Freitag and Yaser Al-Onaizan. Beam Search Strategies for Neural Machine Translation . arXiv:1702.01806, 2017

  27. [27]

    Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023

    Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding , November 2023. URL https://lmsys.org/blog/2023-11-21-lookahead-decoding/

  28. [28]

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling . arXiv:2101.00027, 2020

  29. [29]

    Olivares

    Federico Garza, Max Mergenthaler Canseco, Cristian Challú, and Kin G. Olivares. StatsForecast : Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/statsforecast

  30. [30]

    Probabilistic Forecasting with Spline Quantile Function RNNs

    Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs . In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp.\ ...

  31. [31]

    Strictly proper scoring rules, prediction, and estimation

    Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102 0 (477): 0 359--378, 2007

  32. [32]

    Webb, Rob J

    Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash Time Series Forecasting Archive . In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

  33. [33]

    arXiv preprint arXiv:2402.03885 , year=

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024

  34. [34]

    Large Language Models Are Zero-Shot Time Series Forecasters

    Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large Language Models Are Zero-Shot Time Series Forecasters . In Advances in Neural Information Processing Systems, 2023

  35. [35]

    The Curious Case of Neural Text Degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv:1904.09751, 2019

  36. [36]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models . In International Conference on Learning Representations, 2022

  37. [37]

    Transformer-based deep survival analysis

    Shi Hu, Egill Fridgeirsson, Guido van Wingen, and Max Welling. Transformer-based deep survival analysis. In Survival Prediction-Algorithms, Challenges and Applications, pp.\ 132--148. PMLR, 2021

  38. [38]

    Forecasting with exponential smoothing: the state space approach

    Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008

  39. [39]

    Forecasting: principles and practice

    Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018

  40. [40]

    Another look at measures of forecast accuracy

    Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22 0 (4): 0 679--688, 2006

  41. [41]

    Deep learning for time series classification: a review

    Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. Deep learning for time series classification: a review. Data mining and knowledge discovery, 33 0 (4): 0 917--963, 2019

  42. [42]

    Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen

    Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time- LLM : Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations, 2024

  43. [43]

    Domain adaptation for time series forecasting via attention sharing

    Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In International Conference on Machine Learning, pp.\ 10280--10297. PMLR, 2022

  44. [44]

    LightGBM: A Highly Efficient Gradient Boosting Decision Tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A Highly Efficient Gradient Boosting Decision Tree . Advances in neural information processing systems, 30, 2017

  45. [45]

    Quantile regression

    Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15 0 (4): 0 143--156, 2001

  46. [46]

    A classification of business forecasting problems

    Stephan Kolassa and Tim Januschowski. A classification of business forecasting problems. Foresight, 52, 2019

  47. [47]

    Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

    Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang. Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting . In Advances in Neural Information Processing Systems, volume 36, pp.\ 28341--28364. Curran Associates, Inc., 2023

  48. [48]

    Fast inference from transformers via speculative decoding

    Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pp.\ 19274--19286. PMLR, 2023

  49. [49]

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension . arXiv:1910.13461, 2019

  50. [50]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting

    Bryan Lim, Sercan \"O Ar k, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37 0 (4): 0 1748--1764, 2021

  51. [51]

    Largest: A benchmark dataset for large-scale traffic forecasting

    Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, and Roger Zimmermann. Largest: A benchmark dataset for large-scale traffic forecasting. arXiv:2306.08259, 2023

  52. [52]

    The M3-Competition: results, conclusions and implications

    Spyros Makridakis and Michele Hibon. The M3-Competition: results, conclusions and implications . International journal of forecasting, 16 0 (4): 0 451--476, 2000

  53. [53]

    Accuracy of forecasting: An empirical investigation

    Spyros Makridakis, Michele Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142 0 (2): 0 97--145, 1979

  54. [54]

    The M4 Competition: 100,000 time series and 61 forecasting methods

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 Competition: 100,000 time series and 61 forecasting methods . International Journal of Forecasting, 36 0 (1): 0 54--74, 2020

  55. [55]

    M5 accuracy competition: Results, findings, and conclusions

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competition: Results, findings, and conclusions . International Journal of Forecasting, 38 0 (4): 0 1346--1364, 2022

  56. [56]

    Regression models for ordinal data

    Peter McCullagh. Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42 0 (2): 0 109--127, 1980

  57. [57]

    Pointer Sentinel Mixture Models

    Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv:1609.07843, 2016

  58. [58]

    Large language models as general pattern machines

    Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, and Andy Zeng. Large language models as general pattern machines. In Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, pp.\ 2498--2518. PMLR, 2023

  59. [59]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023

  60. [60]

    Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski

    Kin G. Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski. NeuralForecast : User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US 2022, 2022. URL https://github.com/Nixtla/neuralforecast

  61. [61]

    WaveNet: A Generative Model for Raw Audio

    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv:1609.03499, 2016

  62. [62]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting . In International Conference on Learning Representations, 2020

  63. [63]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

  64. [64]

    Bernardo P \' e rez Orozco and Stephen J. Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks. In 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp.\ 503--508, 2020

  65. [65]

    Learning quantile functions without quantile crossing for distribution-free time series forecasting

    Youngsuk Park, Danielle Maddix, Fran c ois-Xavier Aubet, Kelvin Kan, Jan Gasthaus, and Yuyang Wang. Learning quantile functions without quantile crossing for distribution-free time series forecasting. In International Conference on Artificial Intelligence and Statistics, pp.\ 8127--8150. PMLR, 2022

  66. [66]

    A simple combination of univariate models

    Fotios Petropoulos and Ivan Svetunkov. A simple combination of univariate models. International journal of forecasting, 36 0 (1): 0 110--115, 2020

  67. [67]

    The effectiveness of discretization in forecasting: An empirical study on neural time series models

    Stephan Rabanser, Tim Januschowski, Valentin Flunkert, David Salinas, and Jan Gasthaus. The effectiveness of discretization in forecasting: An empirical study on neural time series models. arXiv:2005.10111, 2020

  68. [68]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

  69. [69]

    Exploring the limits of transfer learning with a unified text-to-text transformer

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21 0 (1): 0 5485--5551, 2020

  70. [70]

    Integrating multimodal information in large pretrained transformers

    Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2020, pp.\ 2359. NIH Public Access, 2020

  71. [71]

    Deep state space models for time series forecasting

    Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018

  72. [72]

    Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting

    Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp.\ 8857--8868. PMLR, 2021

  73. [73]

    Lag-llama: Towards foundation models for time series forecasting, 2023

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models for time series forecasting, 2023

  74. [74]

    Conformalized quantile regression

    Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. Advances in neural information processing systems, 32, 2019

  75. [75]

    Latent ordinary differential equations for irregularly-sampled time series

    Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019

  76. [76]

    Deepar: Probabilistic forecasting with autoregressive recurrent networks

    David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36 0 (3): 0 1181--1191, 2020

  77. [77]

    Neural Machine Translation of Rare Words with Subword Units

    Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv:1508.07909, 2015

  78. [78]

    Autogluon--timeseries: Automl for probabilistic time series forecasting

    Oleksandr Shchur, Ali Caner Turkmen, Nick Erickson, Huibin Shen, Alexander Shirkov, Tony Hu, and Bernie Wang. Autogluon--timeseries: Automl for probabilistic time series forecasting. In International Conference on Automated Machine Learning, pp.\ 9--1. PMLR, 2023

  79. [79]

    Conformal time-series forecasting

    Kamile Stankeviciute, Ahmed M Alaa, and Mihaela van der Schaar. Conformal time-series forecasting. Advances in neural information processing systems, 34: 0 6216--6228, 2021

  80. [80]

    Regression as classification: Influence of task formulation on neural network features

    Lawrence Stewart, Francis Bach, Quentin Berthet, and Jean-Philippe Vert. Regression as classification: Influence of task formulation on neural network features. In International Conference on Artificial Intelligence and Statistics, pp.\ 11563--11582. PMLR, 2023

Showing first 80 references.