Model card and evaluations for claude models

Anthropic · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

cs.AI · 2024-08-12 · unverdicted · novelty 8.0

The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

citing papers explorer

Showing 2 of 2 citing papers.

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery cs.AI · 2024-08-12 · unverdicted · none · ref 3
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 153
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Model card and evaluations for claude models

fields

years

verdicts

representative citing papers

citing papers explorer