pith. sign in

arxiv: 2603.20179 · v3 · pith:LOODI5VEnew · submitted 2026-03-20 · ✦ hep-ex · cs.AI· cs.LG

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

classification ✦ hep-ex cs.AIcs.LG
keywords analysisphysicsagentsautonomouslydataenergyexperimentalhigh
0
0 comments X
read the original abstract

Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a HEP dataset, an execution framework, and a corpus of prior experimental literature, we find that Claude Code succeeds in automating all stages of a typical analysis: event selection, background estimation, uncertainty quantification, statistical inference, and paper drafting. We argue that the experimental HEP community is underestimating the current capabilities of these systems, and that most proposed agentic workflows are too narrowly scoped or scaffolded to specific analysis structures. We present a proof-of-concept framework, Just Furnish Context (JFC), that integrates autonomous analysis agents with literature-based knowledge retrieval and multi-agent review, and show that this is sufficient to plan, execute, and document a credible high energy physics analysis. We demonstrate this by conducting analyses on open data from ALEPH, DELPHI, and CMS to perform electroweak, QCD, and Higgs boson measurements. We present two of those results in a condensed short paper form -- a CMS Run1 Open Data $H\to \tau^+\tau^-$ to demonstrate performance on a well-established result, and the first Lund plane measurement on LEP data -- a genuinely novel result and, to our knowledge, the first produced autonomously by an AI agent. Rather than replacing physicists, these tools promise to offload the repetitive technical burden of analysis code development, freeing researchers to focus on physics insight, truly novel method development, and rigorous validation. Given these developments, we advocate for new strategies for how the community trains students, organizes analysis efforts, and allocates human expertise.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AgentRivet: an automated system for producing Rivet routines from journal publications

    hep-ex 2026-06 unverdicted novelty 7.0

    AgentRivet applies commercial LLMs in an autonomous workflow to extract physics details from ATLAS and CMS papers and generate Rivet routines, achieving few syntax errors but occasional physics implementation issues o...

  2. Large Language Model-Assisted Framework for BSM Model Building

    hep-ph 2026-06 unverdicted novelty 6.0

    An open-source framework that automates BSM Lagrangian construction, anomaly checks, and mass-matrix derivation from natural-language field specifications by using an LLM only as an orchestration layer over a determin...

  3. Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis

    hep-ex 2026-06 unverdicted novelty 6.0

    Agentic hybrid RAG with a new muon collider benchmark outperforms baselines in retrieval effectiveness, answer quality, evidence coverage, and factual grounding.

  4. RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis

    hep-ph 2026-05 unverdicted novelty 6.0

    RooAgent provides an LLM agent interface that translates natural-language prompts into calls to PyROOT analysis functions for high energy physics tasks, with support for multiple AI backends and tested on ZH simulatio...

  5. Analytical and Machine Learning Methods for Model Discernment at CE$\nu$NS Experiments

    hep-ph 2026-04 unverdicted novelty 6.0

    Shape correlations in CEνNS allow likelihood and CNN analyses to discriminate sterile neutrinos from NSI and approximately localize sterile parameters in favorable regions.

  6. A Scientific Human-Agent Reproduction Pipeline

    hep-ph 2026-04 unverdicted novelty 6.0

    SHARP is a human-AI collaboration pipeline for reproducing scientific analyses, demonstrated by recreating a jet classification task from a particle physics paper.

  7. Development of an LLM-Based System for Automatic Code Generation from HEP Publications

    physics.data-an 2026-04 unverdicted novelty 6.0

    A two-stage LLM system extracts structured analysis selections from HEP papers and references then generates and validates executable code, achieving partial event-level matches on an ATLAS Higgs-to-four-leptons bench...

  8. EQSANS-CLI: A natural-language, agent-ready command-line tool for small-angle neutron scattering data reduction at EQ-SANS

    physics.ins-det 2026-05 unverdicted novelty 5.0

    EQSANS-CLI organizes SANS data reduction into a coherent CLI with persistent decision tables, dual input modes, and agent-driven natural language operation via a Slack bot.

  9. Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows

    cs.AI 2026-04 unverdicted novelty 4.0

    CMBAgent achieves high accuracy on well-specified astrophysical tasks with context but generates silent, plausible-yet-incorrect outputs on reasoning-challenging problems, with no self-diagnosis of inconsistencies.

  10. An AI-based Detector Simulation and Reconstruction Model for the ALEPH Experiment at LEP

    physics.ins-det 2026-04 unverdicted novelty 4.0

    Parnassus faithfully reproduces the ALEPH detector response at event, jet, and particle levels for clean e+e- to Z to qqbar events.