Title resolution pending

Standard benchmarks fail – auditing llm agents in finance must prioritize risk · 2025 · arXiv 2502.15865

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

FinSafetyBench shows that LLMs remain vulnerable to adversarial prompts that bypass financial compliance safeguards, with notably higher failure rates in Chinese-language scenarios.

OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset

cs.CL · 2026-03-14 · unverdicted · novelty 7.0

OmniCompliance-100K supplies 12,985 distinct rules and 106,009 associated real-world cases from 74 multi-domain regulations to benchmark LLM safety and compliance.

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.

QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

cs.MA · 2026-04-20 · unverdicted · novelty 6.0

QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.

Conversations Risk Detection LLMs in Financial Agents via Multi-Stage Generative Rollout

cs.CR · 2026-04-10 · unverdicted · novelty 4.0

FinSec is a multi-stage detection system for financial LLM dialogues that reaches 90.13% F1 score, cuts attack success rate to 9.09%, and raises AUPRC to 0.9189.

citing papers explorer

Showing 5 of 5 citing papers.

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios cs.CL · 2026-05-01 · unverdicted · none · ref 1
FinSafetyBench shows that LLMs remain vulnerable to adversarial prompts that bypass financial compliance safeguards, with notably higher failure rates in Chinese-language scenarios.
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset cs.CL · 2026-03-14 · unverdicted · none · ref 2
OmniCompliance-100K supplies 12,985 distinct rules and 106,009 associated real-world cases from 74 multi-domain regulations to benchmark LLM safety and compliance.
Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems cs.AI · 2026-05-07 · unverdicted · none · ref 4
ASR, a new trajectory-fidelity metric, detects that 10 of 18 LLMs skip confirmation steps in payment agents despite perfect scores on prior metrics, and ASR-guided refinements improve task success by up to 93.8 percentage points.
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance cs.MA · 2026-04-20 · unverdicted · none · ref 69
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
Conversations Risk Detection LLMs in Financial Agents via Multi-Stage Generative Rollout cs.CR · 2026-04-10 · unverdicted · none · ref 22
FinSec is a multi-stage detection system for financial LLM dialogues that reaches 90.13% F1 score, cuts attack success rate to 9.09%, and raises AUPRC to 0.9189.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer