BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.
Sheetagent: Towards a generalist agent for spreadsheet reasoning and manipulation via large language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
Spreadsheet-RL applies RL fine-tuning and a custom Gym environment to raise LLM agent Pass@1 scores on spreadsheet benchmarks from roughly 8-12% to 17-23%.
SpreadsheetAgent uses incremental multi-format reading, structural sketching, and verification to raise spreadsheet benchmark accuracy from 35.27% to 38.16%.
citing papers explorer
-
BlueFin: Benchmarking LLM Agents on Financial Spreadsheets
BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.
-
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
Spreadsheet-RL applies RL fine-tuning and a custom Gym environment to raise LLM agent Pass@1 scores on spreadsheet benchmarks from roughly 8-12% to 17-23%.
-
Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning
SpreadsheetAgent uses incremental multi-format reading, structural sketching, and verification to raise spreadsheet benchmark accuracy from 35.27% to 38.16%.