Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.
Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs
8 Pith papers cite this work. Polarity classification is still indexing.
abstract
To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.
citation-role summary
citation-polarity summary
years
2026 8verdicts
UNVERDICTED 8roles
other 1polarities
unclear 1representative citing papers
CastFlow introduces a role-specialized agentic workflow with memory retrieval and multi-view toolkit for iterative ensemble time series forecasting, using two-stage SFT+RLVR training on a domain-specific LLM to outperform static baselines.
LLaTiSA is a vision-language model trained on a new 83k-sample hierarchical time series reasoning dataset that shows superior performance and out-of-distribution generalization on stratified TSR tasks.
STReasoner uses S-GRPO reinforcement learning to let LLMs integrate time series, graphs, and text for spatio-temporal reasoning, delivering 17-135% accuracy gains over baselines on a new four-task benchmark at 0.004X the cost of proprietary models.
GeoDecider introduces a coarse-to-fine agentic workflow using LLMs for explainable lithology classification from well logs, combining a base classifier, tool-augmented reasoning, and geological refinement to outperform baselines on benchmarks.
GeoMind applies an agentic workflow with tool-augmented modules and process supervision to outperform static models on lithology classification from well logs while producing traceable decisions.
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
TimeRFT applies reinforcement learning with multi-faceted step-wise rewards and informative sample selection to improve generalization and accuracy in TSFM adaptation beyond supervised fine-tuning.
citing papers explorer
-
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.
-
CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting
CastFlow introduces a role-specialized agentic workflow with memory retrieval and multi-view toolkit for iterative ensemble time series forecasting, using two-stage SFT+RLVR training on a domain-specific LLM to outperform static baselines.
-
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
LLaTiSA is a vision-language model trained on a new 83k-sample hierarchical time series reasoning dataset that shows superior performance and out-of-distribution generalization on stratified TSR tasks.
-
STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning
STReasoner uses S-GRPO reinforcement learning to let LLMs integrate time series, graphs, and text for spatio-temporal reasoning, delivering 17-135% accuracy gains over baselines on a new four-task benchmark at 0.004X the cost of proprietary models.
-
GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification
GeoDecider introduces a coarse-to-fine agentic workflow using LLMs for explainable lithology classification from well logs, combining a base classifier, tool-augmented reasoning, and geological refinement to outperform baselines on benchmarks.
-
GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation
GeoMind applies an agentic workflow with tool-augmented modules and process supervision to outperform static models on lithology classification from well logs while producing traceable decisions.
-
Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
-
TimeRFT: Stimulating Generalizable Time Series Forecasting for TSFMs via Reinforcement Finetuning
TimeRFT applies reinforcement learning with multi-faceted step-wise rewards and informative sample selection to improve generalization and accuracy in TSFM adaptation beyond supervised fine-tuning.