Collider-Bench is a new benchmark showing that current LLM agents cannot reliably reproduce LHC analyses at the level of a physicist-in-the-loop.
An End-to-end Architecture for Collider Physics and Beyond
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PRL-Bench evaluates frontier LLMs on 100 real physics research tasks and finds the best models score below 50, exposing a gap to autonomous discovery.
AutoResearchClaw introduces a multi-agent research pipeline with debate, self-healing, verifiable outputs, human collaboration modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on ARC-Bench.
EasyScan_HEP 2 adds AI-agent interfaces to a HEP parameter scan framework for natural-language to .ini config translation and new sampler integration.
citing papers explorer
-
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
Collider-Bench is a new benchmark showing that current LLM agents cannot reliably reproduce LHC analyses at the level of a physicist-in-the-loop.
-
PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research
PRL-Bench evaluates frontier LLMs on 100 real physics research tasks and finds the best models score below 50, exposing a gap to autonomous discovery.
-
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
AutoResearchClaw introduces a multi-agent research pipeline with debate, self-healing, verifiable outputs, human collaboration modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on ARC-Bench.
-
EasyScan_HEP 2: Agent-Ready Parameter Scans for High-Energy Physics
EasyScan_HEP 2 adds AI-agent interfaces to a HEP parameter scan framework for natural-language to .ini config translation and new sampler integration.