Xu, and Graham Neubig

Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F · 2023 · DOI 10.18653/v1/2023.findings-eacl.20

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

cs.SE · 2026-05-29 · unverdicted · novelty 6.0

BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.

Showing 1 of 1 citing paper.

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets cs.SE · 2026-05-29 · unverdicted · none · ref 24
BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.