EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.
Llm cyber evaluations don’t capture real-world risk.arXiv preprint arXiv:2502.00072
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Frontier AI safety policies have a structural coordination gap caused by diffuse benefits and concentrated costs, which can be addressed by adapting precommitment and shared response protocols from other high-risk domains.
citing papers explorer
-
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting
EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.
-
The coordination gap in frontier AI safety policies
Frontier AI safety policies have a structural coordination gap caused by diffuse benefits and concentrated costs, which can be addressed by adapting precommitment and shared response protocols from other high-risk domains.