Evaluating Frontier Models for Dangerous Capabilities,

As of January 13 · 2026 · DOI 10.1038/s41587-025-02650-8

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Evaluating Large Language Models in Scientific Discovery

cs.AI · 2025-12-17 · unverdicted · novelty 8.0

The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.

Measuring Biological Capabilities and Risks of AI Agents

cs.CY · 2026-06-18 · unverdicted · novelty 3.0

Synthesizes current evidence on AI biological risks and provides experience-grounded considerations for defining, running, and interpreting agentic evaluations.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluating Large Language Models in Scientific Discovery cs.AI · 2025-12-17 · unverdicted · none · ref 55
The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.
Measuring Biological Capabilities and Risks of AI Agents cs.CY · 2026-06-18 · unverdicted · none · ref 4
Synthesizes current evidence on AI biological risks and provides experience-grounded considerations for defining, running, and interpreting agentic evaluations.

Evaluating Frontier Models for Dangerous Capabilities,

fields

years

verdicts

representative citing papers

citing papers explorer