AI Agents Can Already Autonomously Perform Experimental High Energy Physics
read the original abstract
Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a HEP dataset, an execution framework, and a corpus of prior experimental literature, we find that Claude Code succeeds in automating all stages of a typical analysis: event selection, background estimation, uncertainty quantification, statistical inference, and paper drafting. We argue that the experimental HEP community is underestimating the current capabilities of these systems, and that most proposed agentic workflows are too narrowly scoped or scaffolded to specific analysis structures. We present a proof-of-concept framework, Just Furnish Context (JFC), that integrates autonomous analysis agents with literature-based knowledge retrieval and multi-agent review, and show that this is sufficient to plan, execute, and document a credible high energy physics analysis. We demonstrate this by conducting analyses on open data from ALEPH, DELPHI, and CMS to perform electroweak, QCD, and Higgs boson measurements. We present two of those results in a condensed short paper form -- a CMS Run1 Open Data $H\to \tau^+\tau^-$ to demonstrate performance on a well-established result, and the first Lund plane measurement on LEP data -- a genuinely novel result and, to our knowledge, the first produced autonomously by an AI agent. Rather than replacing physicists, these tools promise to offload the repetitive technical burden of analysis code development, freeing researchers to focus on physics insight, truly novel method development, and rigorous validation. Given these developments, we advocate for new strategies for how the community trains students, organizes analysis efforts, and allocates human expertise.
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
AgentRivet: an automated system for producing Rivet routines from journal publications
AgentRivet applies commercial LLMs in an autonomous workflow to extract physics details from ATLAS and CMS papers and generate Rivet routines, achieving few syntax errors but occasional physics implementation issues o...
-
Large Language Model-Assisted Framework for BSM Model Building
An open-source framework that automates BSM Lagrangian construction, anomaly checks, and mass-matrix derivation from natural-language field specifications by using an LLM only as an orchestration layer over a determin...
-
Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis
Agentic hybrid RAG with a new muon collider benchmark outperforms baselines in retrieval effectiveness, answer quality, evidence coverage, and factual grounding.
-
RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis
RooAgent provides an LLM agent interface that translates natural-language prompts into calls to PyROOT analysis functions for high energy physics tasks, with support for multiple AI backends and tested on ZH simulatio...
-
Analytical and Machine Learning Methods for Model Discernment at CE$\nu$NS Experiments
Shape correlations in CEνNS allow likelihood and CNN analyses to discriminate sterile neutrinos from NSI and approximately localize sterile parameters in favorable regions.
-
A Scientific Human-Agent Reproduction Pipeline
SHARP is a human-AI collaboration pipeline for reproducing scientific analyses, demonstrated by recreating a jet classification task from a particle physics paper.
-
Development of an LLM-Based System for Automatic Code Generation from HEP Publications
A two-stage LLM system extracts structured analysis selections from HEP papers and references then generates and validates executable code, achieving partial event-level matches on an ATLAS Higgs-to-four-leptons bench...
-
EQSANS-CLI: A natural-language, agent-ready command-line tool for small-angle neutron scattering data reduction at EQ-SANS
EQSANS-CLI organizes SANS data reduction into a coherent CLI with persistent decision tables, dual input modes, and agent-driven natural language operation via a Slack bot.
-
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows
CMBAgent achieves high accuracy on well-specified astrophysical tasks with context but generates silent, plausible-yet-incorrect outputs on reasoning-challenging problems, with no self-diagnosis of inconsistencies.
-
An AI-based Detector Simulation and Reconstruction Model for the ALEPH Experiment at LEP
Parnassus faithfully reproduces the ALEPH detector response at event, jet, and particle levels for clean e+e- to Z to qqbar events.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.