Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.
hub
arXiv preprint arXiv:2310.10688 (2023)
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
LASS-ODE-Power is a pretrained model that predicts power-system dynamic trajectories across regimes in a zero-shot manner after large-scale ODE pretraining and targeted fine-tuning.
MICA adapts infini compressive attention to the channel dimension, enabling scalable cross-channel dependencies in Transformers and cutting forecast error by 5.4% on average versus channel-independent baselines.
DynLMC creates synthetic time series data with dynamic inter-channel correlations that improve zero-shot forecasting in foundation models across multiple benchmarks.
A hybrid classical-plus-quantum-inspired framework for cross-region renewable energy forecasting matches top baselines within 1% accuracy and separates calm versus stormy conditions with a 15-fold higher Fisher discriminant ratio than a tuned radial basis kernel.
A degradation-aware predictive controller for hybrid ship power systems reduces hydrogen consumption by up to 5.8% and fuel cell degradation by up to 36.4% versus a filter-based benchmark on real harbor tug data.
citing papers explorer
-
PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data
Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.
-
TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.
-
FactoryBench: Evaluating Industrial Machine Understanding
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
-
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
-
Chronos: Learning the Language of Time Series
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
-
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
-
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
-
Continuity Laws for Sequential Models
S4 models exhibit stable time-continuity unlike sensitive S6 models, with task continuity predicting performance and enabling temporal subsampling for better efficiency.
-
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
-
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
-
Predicting Power-System Dynamic Trajectories with Foundation Models
LASS-ODE-Power is a pretrained model that predicts power-system dynamic trajectories across regimes in a zero-shot manner after large-scale ODE pretraining and targeted fine-tuning.
-
MICA: Multivariate Infini Compressive Attention for Time Series Forecasting
MICA adapts infini compressive attention to the channel dimension, enabling scalable cross-channel dependencies in Transformers and cutting forecast error by 5.4% on average versus channel-independent baselines.
-
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series
DynLMC creates synthetic time series data with dynamic inter-channel correlations that improve zero-shot forecasting in foundation models across multiple benchmarks.
-
A Quantum Inspired Variational Kernel and Explainable AI Framework for Cross Region Solar and Wind Energy Forecasting
A hybrid classical-plus-quantum-inspired framework for cross-region renewable energy forecasting matches top baselines within 1% accuracy and separates calm versus stormy conditions with a 15-fold higher Fisher discriminant ratio than a tuned radial basis kernel.
-
Degradation-aware Predictive Energy Management for Fuel Cell-Battery Ship Power System with Data-driven Load Forecasting
A degradation-aware predictive controller for hybrid ship power systems reduces hydrogen consumption by up to 5.8% and fuel cell degradation by up to 36.4% versus a filter-based benchmark on real harbor tug data.