Collider-Bench is a new benchmark showing that current LLM agents cannot reliably reproduce LHC analyses at the level of a physicist-in-the-loop.
An End-to-end Architecture for Collider Physics and Beyond
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PRL-Bench evaluates frontier LLMs on 100 real physics research tasks and finds the best models score below 50, exposing a gap to autonomous discovery.
AutoResearchClaw introduces a multi-agent research pipeline with debate, self-healing, verifiable outputs, human collaboration modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on ARC-Bench.
EasyScan_HEP 2 adds AI-agent interfaces to a HEP parameter scan framework for natural-language to .ini config translation and new sampler integration.
citing papers explorer
-
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
AutoResearchClaw introduces a multi-agent research pipeline with debate, self-healing, verifiable outputs, human collaboration modes, and cross-run evolution that outperforms AI Scientist v2 by 54.7% on ARC-Bench.