hub

BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann · 2023 · cs.LG · arXiv 2303.17564

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

q-fin.CP · 2026-04-03 · conditional · novelty 8.0

Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.

AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection

cs.CE · 2026-05-09 · unverdicted · novelty 7.0

AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.

From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets

q-fin.PM · 2026-04-29 · unverdicted · novelty 7.0

Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Stateful sessions with incremental KV cache and flash queries allow O(|q|) latency in streaming transformer inference, delivering up to 5.9x speedup over conventional engines while preserving full attention.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

Effective Performance Measurement: Challenges and Opportunities in KPI Extraction from Earnings Calls

cs.CL · 2026-05-04 · unverdicted · novelty 6.0

Encoder models trained on SEC filings struggle with earnings calls due to domain shift, while LLMs enable open-ended KPI extraction with 79.7% human-verified precision on newly introduced benchmarks.

RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization

cs.CL · 2026-04-26 · unverdicted · novelty 6.0

RouteNLP is a closed-loop LLM routing framework using conformal cascading and distillation co-optimization that cut inference costs by 58% in an 8-week enterprise deployment while preserving 91% acceptance and high quality on benchmarks.

Cross-Stock Predictability via LLM-Augmented Semantic Networks

q-fin.PM · 2026-04-21 · unverdicted · novelty 6.0

LLM filtering of embedding-based stock networks raises long-short Sharpe ratio from 0.742 to 0.820 and cuts max drawdown from -10.47% to -7.85% in 2011-2019 S&P 500 backtests.

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

cs.MA · 2026-04-20 · unverdicted · novelty 6.0

QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.

MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model

cs.CE · 2026-04-20 · unverdicted · novelty 6.0

MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.

SenseAI: A Human-in-the-Loop Dataset for RLHF-Aligned Financial Sentiment Reasoning

cs.CL · 2026-04-06 · unverdicted · novelty 6.0

SenseAI is a human-in-the-loop financial sentiment dataset with reasoning processes and market outcomes that reveals predictable LLM error patterns like Latent Reasoning Drift for RLHF alignment.

SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

cs.SE · 2026-04-06 · unverdicted · novelty 6.0

SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

cs.AI · 2026-04-04 · unverdicted · novelty 6.0

PolySwarm aggregates predictions from 50 LLM personas for Polymarket trading using Bayesian combination and divergence metrics, outperforming single models in calibration while adding latency arbitrage via CEX price models.

CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion

cs.LG · 2026-04-01 · unverdicted · novelty 6.0

CGCMA separates text-conditioned grounding from lag-aware trust gating to fuse asynchronous price and web data, yielding the highest Sharpe ratio of +0.449 on a new crypto news corpus.

Jailbreaking Black Box Large Language Models in Twenty Queries

cs.LG · 2023-10-12 · conditional · novelty 6.0

PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

cs.LG · 2023-10-05 · accept · novelty 6.0

SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

cs.AI · 2026-04-26 · unverdicted · novelty 5.0

FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

cs.CL · 2026-04-13 · unverdicted · novelty 5.0

LLM features optimized for high information coefficient with returns do not reliably improve PPO trading policies under distribution shifts, where price-only or macro baselines remain more robust.

PRAGMA: Revolut Foundation Model

cs.LG · 2026-04-09 · unverdicted · novelty 5.0

PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and lifetime value prediction using linear heads or light fine-tuning.

CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

CROP achieves 80.6% token reduction on GSM8K, LogiQA and BIG-Bench Hard with only nominal accuracy decline by regularizing automatic prompt optimization with response-length feedback.

FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

FinReporting builds a canonical ontology for income, balance, and cash flow statements and uses constrained LLM agents as verifiers to produce localized, auditable reports from US, Japanese, and Chinese filings.

A Multi-Agent Orchestration Framework for Venture Capital Due Diligence

cs.MA · 2026-05-13 · unverdicted · novelty 4.0

A multi-agent orchestration framework automates VC due diligence using LLMs, web retrieval, and a programmatic pipeline to extract and parse official Greek business registry filings while flagging data gaps.

AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems

q-fin.TR · 2026-05-01 · unverdicted · novelty 4.0

AgenticAITA proposes a training-free multi-agent LLM framework for autonomous trading using a deliberative pipeline, Z-score triggers, and safety gates, shown to run correctly in a five-day live dry-run with 157 invocations.

citing papers explorer

Showing 25 of 25 citing papers.

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data q-fin.CP · 2026-04-03 · conditional · none · ref 21 · internal anchor
Only two of seven LLMs produce positive returns on live Polymarket data, with MiMo-V2-Flash at 17.6% CWR and Gemini-3-Flash at 6.2% CWR while the other five lose money.
AutoRedTrader: Autonomous Red Teaming of Trading Agents through Synthetic Misinformation Injection cs.CE · 2026-05-09 · unverdicted · none · ref 30 · internal anchor
AutoRedTrader generates synthetic financial misinformation via behavioral bias manipulation and agent feedback to red-team LLM trading agents, reaching 69% exposure and 26.67% attack success on Bitcoin data simulations.
From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets q-fin.PM · 2026-04-29 · unverdicted · none · ref 8 · internal anchor
Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers cs.LG · 2026-05-13 · unverdicted · none · ref 2 · internal anchor
Stateful sessions with incremental KV cache and flash queries allow O(|q|) latency in streaming transformer inference, delivering up to 5.9x speedup over conventional engines while preserving full attention.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering cs.AI · 2026-05-06 · unverdicted · none · ref 33 · internal anchor
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
Effective Performance Measurement: Challenges and Opportunities in KPI Extraction from Earnings Calls cs.CL · 2026-05-04 · unverdicted · none · ref 90 · internal anchor
Encoder models trained on SEC filings struggle with earnings calls due to domain shift, while LLMs enable open-ended KPI extraction with 79.7% human-verified precision on newly introduced benchmarks.
RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization cs.CL · 2026-04-26 · unverdicted · none · ref 5 · internal anchor
RouteNLP is a closed-loop LLM routing framework using conformal cascading and distillation co-optimization that cut inference costs by 58% in an 8-week enterprise deployment while preserving 91% acceptance and high quality on benchmarks.
Cross-Stock Predictability via LLM-Augmented Semantic Networks q-fin.PM · 2026-04-21 · unverdicted · none · ref 6 · internal anchor
LLM filtering of embedding-based stock networks raises long-short Sharpe ratio from 0.742 to 0.820 and cuts max drawdown from -10.47% to -7.85% in 2011-2019 S&P 500 backtests.
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance cs.MA · 2026-04-20 · unverdicted · none · ref 60 · internal anchor
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model cs.CE · 2026-04-20 · unverdicted · none · ref 49 · internal anchor
MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.
SenseAI: A Human-in-the-Loop Dataset for RLHF-Aligned Financial Sentiment Reasoning cs.CL · 2026-04-06 · unverdicted · none · ref 4 · internal anchor
SenseAI is a human-in-the-loop financial sentiment dataset with reasoning processes and market outcomes that reveals predictable LLM error patterns like Latent Reasoning Drift for RLHF alignment.
SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics cs.SE · 2026-04-06 · unverdicted · none · ref 18 · internal anchor
SysTradeBench evaluates 17 LLMs on 12 trading strategies, finding over 91.7% code validity but rapid convergence in iterative fixes and a continued need for human oversight on critical strategies.
PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage cs.AI · 2026-04-04 · unverdicted · none · ref 22 · internal anchor
PolySwarm aggregates predictions from 50 LLM personas for Polymarket trading using Bayesian combination and divergence metrics, outperforming single models in calibration while adding latency arbitrage via CEX price models.
CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion cs.LG · 2026-04-01 · unverdicted · none · ref 30 · internal anchor
CGCMA separates text-conditioned grounding from lag-aware trust gating to fuse asynchronous price and web data, yielding the highest Sharpe ratio of +0.449 on a new crypto news corpus.
Jailbreaking Black Box Large Language Models in Twenty Queries cs.LG · 2023-10-12 · conditional · none · ref 2 · internal anchor
PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks cs.LG · 2023-10-05 · accept · none · ref 15 · internal anchor
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics cs.LG · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
SSAI maps news into four factors (sentiment, risk, confidence, volatility) for trading, but factor portfolios, ridge models, and RL agents show no reliable edge over baselines after coverage controls and costs.
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification cs.AI · 2026-04-26 · unverdicted · none · ref 6 · internal anchor
FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.
When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies cs.CL · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
LLM features optimized for high information coefficient with returns do not reliably improve PPO trading policies under distribution shifts, where price-only or macro baselines remain more robust.
PRAGMA: Revolut Foundation Model cs.LG · 2026-04-09 · unverdicted · none · ref 16 · internal anchor
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and lifetime value prediction using linear heads or light fine-tuning.
CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization cs.CL · 2026-04-08 · unverdicted · none · ref 2 · internal anchor
CROP achieves 80.6% token reduction on GSM8K, LogiQA and BIG-Bench Hard with only nominal accuracy decline by regularizing automatic prompt optimization with response-length feedback.
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures cs.CL · 2026-04-07 · unverdicted · none · ref 2 · internal anchor
FinReporting builds a canonical ontology for income, balance, and cash flow statements and uses constrained LLM agents as verifiers to produce localized, auditable reports from US, Japanese, and Chinese filings.
A Multi-Agent Orchestration Framework for Venture Capital Due Diligence cs.MA · 2026-05-13 · unverdicted · none · ref 8 · internal anchor
A multi-agent orchestration framework automates VC due diligence using LLMs, web retrieval, and a programmatic pipeline to extract and parse official Greek business registry filings while flagging data gaps.
AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems q-fin.TR · 2026-05-01 · unverdicted · none · ref 8 · internal anchor
AgenticAITA proposes a training-free multi-agent LLM framework for autonomous trading using a deliberative pipeline, Z-score triggers, and safety gates, shown to run correctly in a five-day live dry-run with 157 invocations.
ComplianceNLP: Knowledge-Graph-Augmented RAG for Multi-Framework Regulatory Gap Detection cs.CL · 2026-04-26 · unverdicted · none · ref 5 · internal anchor
ComplianceNLP integrates knowledge-graph-augmented RAG, multi-task legal text extraction, and gap analysis to detect regulatory compliance gaps, reporting 87.7 F1 and real-world efficiency gains over GPT-4o baselines.

BloombergGPT: A Large Language Model for Finance

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer