Llm cyber evaluations don’t capture real-world risk.arXiv preprint arXiv:2502.00072

URLhttps://arxiv · arXiv 2502.00072

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.

The coordination gap in frontier AI safety policies

cs.CY · 2026-02-21 · unverdicted · novelty 4.0

Frontier AI safety policies have a structural coordination gap caused by diffuse benefits and concentrated costs, which can be addressed by adapting precommitment and shared response protocols from other high-risk domains.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting cs.AI · 2026-06-08 · unverdicted · none · ref 69
EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.
The coordination gap in frontier AI safety policies cs.CY · 2026-02-21 · unverdicted · none · ref 16
Frontier AI safety policies have a structural coordination gap caused by diffuse benefits and concentrated costs, which can be addressed by adapting precommitment and shared response protocols from other high-risk domains.

Llm cyber evaluations don’t capture real-world risk.arXiv preprint arXiv:2502.00072

fields

years

verdicts

representative citing papers

citing papers explorer