pith. sign in

Position: Standard benchmarks fail -- auditing LLM agents in finance must prioritize risk, 2025

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

years

2026 7

verdicts

UNVERDICTED 7

clear filters

representative citing papers

Flaws in the LLM Automation Narrative

stat.OT · 2026-06-09 · unverdicted · novelty 7.0

A new code-writing data analysis benchmark shows human experts outperforming a frontier LLM on average with lower performance variance.

citing papers explorer

Showing 7 of 7 citing papers after filters.