arxiv: 2605.02491 · v1 · submitted 2026-05-04 · ✦ hep-ex · cs.AI· cs.IR

Recognition: unknown

From Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Model

Altan Cakir , Ayca Yerlikaya

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:29 UTC · model grok-4.3

classification ✦ hep-ex cs.AIcs.IR

keywords retrieval-augmented generationmulti-agent systemshigh-energy physicsbeyond standard modelHEPDataexclusion limitsCMS searchesscientific co-pilot

0 comments

The pith

A multi-agent AI framework retrieves measurements and reconstructs exclusion limits directly from HEPData for consistent comparisons across beyond-Standard-Model searches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HEP-CoPilot as a retrieval-augmented multi-agent system that combines publication text, structured data from HEPData, and reconstructed physics plots into one reasoning workflow. Case studies on recent CMS analyses for physics beyond the Standard Model demonstrate that the system can pull relevant measurements, rebuild exclusion limits from the records, and compare constraints across papers without manual data work. This addresses the growing difficulty of manually integrating heterogeneous literature as the volume of experimental results expands. If successful, the approach turns scattered experimental outputs into structured, comparable evidence that supports faster interpretation of new physics possibilities.

Core claim

HEP-CoPilot unifies textual information from publications, structured experimental data from HEPData, and reconstructed physics plots within a multimodal retrieval and reasoning architecture. By combining retrieval-augmented language models with coordinated agent workflows, the framework enables evidence-grounded reasoning over experimental analyses and structured interpretation of collider results, as shown in evaluations on recent CMS searches for physics beyond the Standard Model.

What carries the argument

The HEP-CoPilot retrieval-augmented multi-agent framework, which coordinates agents to retrieve, interpret, and cross-compare data from text, HEPData records, and plots for BSM searches.

If this is right

Physicists gain the ability to perform consistent, physics-aware comparisons of experimental constraints across multiple analyses without manual data integration.
The time required to navigate and structure heterogeneous evidence from high-energy physics literature is reduced.
The pipeline for interpreting collider results in searches for new physics is accelerated through automated evidence grounding.
Retrieval-augmented AI systems can serve as scientific co-pilots that handle multimodal experimental data in particle physics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar architectures could be tested on other experimental fields that combine text, plots, and tabular data, such as astrophysics or materials science.
Systematic comparisons enabled by the tool might reveal previously unnoticed gaps in experimental coverage across related BSM searches.
Direct links from the reconstructed limits to theoretical model parameters could be added to automate viability assessments for new physics scenarios.

Load-bearing premise

The multi-agent retrieval and reasoning architecture can accurately interpret and reconstruct complex physics plots and heterogeneous experimental data without introducing significant errors or misinterpretations.

What would settle it

A controlled test case in which the system reconstructs exclusion limits from two specific CMS HEPData entries, performs a cross-comparison, and the numerical outputs or limit curves are checked against independent manual extraction for mismatches in key values or conclusions.

Figures

Figures reproduced from arXiv: 2605.02491 by Altan Cakir, Ayca Yerlikaya.

**Figure 1.** Figure 1: Overall multi-agent architecture of the HEP-CoPilot framework. A Mission Control module first analyzes the user query and determines view at source ↗

**Figure 2.** Figure 2: Retrieval-augmented query processing workflow in HEP view at source ↗

**Figure 3.** Figure 3: Multimodal scientific data processing pipeline used in HEP-CoPilot. Scientific publications written in LaTeX and structured datasets view at source ↗

**Figure 5.** Figure 5: Result of Q2 illustrating interpretation of the pixel-only ion view at source ↗

**Figure 6.** Figure 6: Result of Q3 illustrating lifetime-dependent LLP mass lim view at source ↗

**Figure 8.** Figure 8: Result of Q5 illustrating channel-to-channel comparison of view at source ↗

**Figure 10.** Figure 10: Result of Q7 illustrating cross-paper comparison of ob view at source ↗

**Figure 12.** Figure 12: Result of Q9 illustrating cross-paper reasoning about view at source ↗

**Figure 13.** Figure 13: Lifetime–signature coverage map constructed by combin view at source ↗

read the original abstract

Modern searches for physics beyond the Standard Model produce rapidly expanding literature containing heterogeneous information, including textual analyses, numerical datasets, and graphical exclusion limits. Integrating these distributed sources remains a time-consuming and manual process for physicists. We present HEP-CoPilot, a retrieval-augmented multi-agent AI framework for the exploration and interpretation of high-energy physics literature. The system unifies textual information from publications, structured experimental data from HEPData, and reconstructed physics plots within a multimodal retrieval and reasoning architecture. By combining retrieval-augmented language models with coordinated agent workflows, it enables evidence-grounded reasoning over experimental analyses and structured interpretation of collider results. We evaluate the framework on recent CMS searches for physics beyond the Standard Model. Case studies show that HEP-CoPilot can retrieve relevant measurements, reconstruct exclusion limits directly from HEPData records, and perform cross-paper comparisons of experimental constraints. This enables consistent, physics-aware comparison across analyses without manual data integration. These results demonstrate that retrieval-augmented AI systems can function as scientific co-pilots for particle physics, facilitating navigation of complex literature, structuring heterogeneous evidence, and accelerating the interpretation pipeline for new physics searches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces HEP-CoPilot, a retrieval-augmented multi-agent AI framework that unifies textual information from publications, structured data from HEPData, and reconstructed physics plots to enable evidence-grounded reasoning over BSM searches. Evaluated via case studies on recent CMS analyses, the system is claimed to retrieve relevant measurements, reconstruct exclusion limits directly from HEPData records, and perform consistent cross-paper comparisons of experimental constraints without manual data integration.

Significance. If the accuracy claims hold, the framework could meaningfully reduce the manual effort required to synthesize heterogeneous experimental results in high-energy physics, enabling faster and more consistent interpretation of collider constraints across analyses. The coordinated multi-agent architecture for multimodal retrieval represents a concrete engineering contribution to AI-assisted scientific workflows in particle physics.

major comments (1)

Abstract and evaluation description: The central claim that HEP-CoPilot enables 'accurate' reconstruction of exclusion limits and 'consistent, physics-aware comparison' rests entirely on qualitative case studies, with no quantitative metrics reported (e.g., no precision/recall or fidelity scores for contour reconstruction from HEPData, no error rates on plot digitization or constraint interpretation, and no direct comparison of AI-derived limits against manual extractions or published results). This is load-bearing for the assertion of reliable cross-paper comparisons.

minor comments (2)

The manuscript would benefit from explicit details on the multi-agent coordination protocol, the specific retrieval mechanisms for heterogeneous data formats, and any safeguards against misinterpretation of physics constraints in the reasoning workflow.
Figure captions and text should clarify which specific CMS searches were used in the case studies and whether the reconstructed limits were validated against the original publications.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of the framework's potential. We address the major comment on the evaluation below, committing to strengthen the manuscript accordingly.

read point-by-point responses

Referee: Abstract and evaluation description: The central claim that HEP-CoPilot enables 'accurate' reconstruction of exclusion limits and 'consistent, physics-aware comparison' rests entirely on qualitative case studies, with no quantitative metrics reported (e.g., no precision/recall or fidelity scores for contour reconstruction from HEPData, no error rates on plot digitization or constraint interpretation, and no direct comparison of AI-derived limits against manual extractions or published results). This is load-bearing for the assertion of reliable cross-paper comparisons.

Authors: We agree that the evaluation in the current manuscript is based on qualitative case studies, which demonstrate the end-to-end workflow but do not include quantitative benchmarks. This is a valid observation regarding the strength of the claims. In the revised version, we will expand the evaluation section to include quantitative metrics: (i) precision and recall for the retrieval of relevant HEPData records and publications across a larger test set of BSM searches; (ii) fidelity scores (e.g., Hausdorff distance or area overlap) for reconstructed exclusion contours compared against manually digitized published limits; and (iii) direct side-by-side comparisons of AI-derived constraints versus published results for at least five additional analyses. These additions will provide measurable evidence for the reliability of cross-paper comparisons while preserving the illustrative value of the existing case studies. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework evaluated on external data

full rationale

The paper describes an engineering system (HEP-CoPilot) for retrieving and interpreting HEP literature and data, with evaluation via case studies on independent CMS BSM searches and HEPData records. No mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems appear in the provided text. Claims rest on the system's behavior with external sources rather than any self-referential reduction, self-citation chains, or ansatzes smuggled from prior author work. This matches the default expectation for non-circularity in system-description papers without internal derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that current large language models augmented with retrieval can perform reliable physics-aware reasoning over heterogeneous sources; no free parameters are introduced and no new physical entities are postulated.

axioms (1)

domain assumption Retrieval-augmented language models combined with multi-agent coordination can produce accurate, evidence-grounded interpretations of experimental physics results.
This is the core premise enabling the framework's functionality as stated in the abstract.

invented entities (1)

HEP-CoPilot no independent evidence
purpose: A retrieval-augmented multi-agent system serving as a scientific co-pilot for interpreting BSM searches.
The system is introduced and named in this work; no independent evidence of its performance is provided beyond the abstract description.

pith-pipeline@v0.9.0 · 5511 in / 1371 out tokens · 29146 ms · 2026-05-08T02:29:41.040379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 13 canonical work pages · 4 internal anchors

[1]

ATLAS Collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, Journal of In- strumentation 3 (2008) S08003

2008
[2]

CMS Collaboration, The CMS Experiment at the CERN LHC, Journal of Instrumentation 3 (2008) S08004

2008
[3]

Denby, Neural Networks for Pattern Recog- nition in High-Energy Physics Events, Computer Physics Communications 49 (1988) 429–448

B. Denby, Neural Networks for Pattern Recog- nition in High-Energy Physics Events, Computer Physics Communications 49 (1988) 429–448

1988
[4]

Radovic et al., Machine Learning at the Energy and Intensity Frontiers of Particle Physics, Nature 560 (2018) 41–48

A. Radovic et al., Machine Learning at the Energy and Intensity Frontiers of Particle Physics, Nature 560 (2018) 41–48

2018
[5]

Ramprasad et al., Large Language Models in Science, arXiv:2501.05382 (2025)

R. Ramprasad et al., Large Language Models in Science, arXiv:2501.05382 (2025)

work page arXiv 2025
[6]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P. Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS (2020), arXiv:2005.11401. 17

work page internal anchor Pith review arXiv 2020
[7]

J. Lála, B. Albalawi, Z. Akata, PaperQA: A Retrieval-Augmented Generative Agent for Scientific Research, EMNLP (2023), arXiv:2307.07782

work page arXiv 2023
[8]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

S. Pramanick, S. Ghosh et al., SPIQA: Bench- marking Scientific Paper Information Extraction and Question Answering, ACL Findings (2024), arXiv:2408.06292

work page internal anchor Pith review arXiv 2024
[9]

Ghosh, S

S. Ghosh, S. Pramanick et al., SciTabQA: Tabular Reasoning in Scientific Papers, EMNLP Findings (2023), arXiv:2305.07966

work page arXiv 2023
[10]

Simons, N

B. Simons, N. P. Hartland, Astro-HEP-BERT: Domain-Adaptive Language Models for Astron- omy and High-Energy Physics, arXiv:2401.02755 (2024)

work page arXiv 2024
[11]

Hellert et al., PhysBERT: A Text Embedding Model for Physics Literature, arXiv:2403.08367 (2024)

T. Hellert et al., PhysBERT: A Text Embedding Model for Physics Literature, arXiv:2403.08367 (2024)

work page arXiv 2024
[12]

H. Q. Nguyen et al., AstroLLaMA: An Adaptive Large Language Model for Astronomy and Astro- physics, arXiv:2309.09122 (2023)

work page arXiv 2023
[13]

Polak, A

A. Polak, A. Morgan, PlotExtract: A Chain-of- Thought Visual Reasoning Approach for Data Ex- traction from Scientific Plots, arXiv:2404.08066 (2024)

work page arXiv 2024
[14]

Maguire et al., HEPData: a repository for high energy physics data, Journal of Physics: Confer- ence Series 898 (2017) 102006

E. Maguire et al., HEPData: a repository for high energy physics data, Journal of Physics: Confer- ence Series 898 (2017) 102006

2017
[15]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

L. Zheng et al., Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, arXiv:2306.05685 (2023)

work page internal anchor Pith review arXiv 2023
[16]

CMS Collaboration, Search for heavy long-lived charged particles with large ionization energy loss in proton-proton collisions at √s=13 TeV , Phys- ical Review D 111 (2025) 012011

2025
[17]

CMS Collaboration, Search for light long-lived particles decaying to displaced jets in proton- proton collisions at √s=13.6 TeV , Reports on Progress in Physics 88 (2025) 037801

2025
[18]

CMS Collaboration, Search for top squarks in fi- nal states with many light-flavor jets and 0, 1, or 2 charged leptons in proton-proton collisions at√s=13 TeV , Journal of High Energy Physics 10 (2025) 236

2025
[19]

Towards an AI co-scientist

Z. Lu et al., AI Scientist: Towards Fully Au- tomated Scientific Discovery, arXiv:2502.18864 (2024)

work page internal anchor Pith review arXiv 2024
[20]

Google Research, Accelerating Scientific Break- throughs with an AI Co-Scientist, Google Re- search Blog (2024)

2024
[21]

Xu et al., Large-Scale Multi-Agent Debate Improves Scientific Review, arXiv:2311.16446 (2023)

H. Xu et al., Large-Scale Multi-Agent Debate Improves Scientific Review, arXiv:2311.16446 (2023)

work page arXiv 2023
[22]

Duarte, Novel Machine Learning Applica- tions at the LHC, Proceedings of ICHEP 2024, arXiv:2409.20413

J. Duarte, Novel Machine Learning Applica- tions at the LHC, Proceedings of ICHEP 2024, arXiv:2409.20413. 18

work page arXiv 2024