arxiv: 2604.21804 · v1 · submitted 2026-04-23 · ⚛️ physics.ins-det · hep-ex· hep-ph

Recognition: unknown

Phenomenological Detector Design and Optimization in Vertically-Integrated Differentiable Full Simulations with Agentic-AI

Julia Gonski, Liangyu Wu, Qibin Liu, Wonyong Chung

Pith reviewed 2026-05-08 13:00 UTC · model grok-4.3

classification ⚛️ physics.ins-det hep-exhep-ph

keywords AI agentsdetector optimizationhigh-energy physicsdifferentiable simulationbilevel optimizationelectromagnetic calorimeterparameter tuning

0 comments

The pith

AI agents using current language models can optimize detector geometry, digitization, and reconstruction parameters together in a single differentiable simulation for high-energy physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that AI agents can be inserted into the full workflow of designing and tuning particle detectors by means of a bilevel optimization loop. This loop runs a differentiable end-to-end simulation that includes crystal layout, front-end electronics, and high-level reconstruction, letting the agent adjust parameters at every layer at once. The demonstration uses a dual-readout segmented crystal electromagnetic calorimeter whose baseline resolution is given as 3 percent over square root of energy. The authors show that off-the-shelf reasoning models, given no extra experiment-specific instructions, can already carry out the multi-step workflow, reduce selected parameters, and propose further generic improvements. If the result holds, detector R&D moves from sequential expert-driven steps to a more automated traversal of the combined parameter space.

Core claim

The authors present the first implementation of AI agents for detector design in high-energy physics through a bilevel optimization framework. The framework vertically integrates detector geometry, front-end digitization, and high-level reconstruction algorithm parameters inside differentiable full simulations. On the concrete example of a dual-readout segmented crystal electromagnetic calorimeter, the agent simultaneously tunes crystal granularity and length, number of ADC bits, sampling rate, and center-of-gravity hit-clustering radius. The work finds that today’s LLM-based reasoning models, without added experiment-specific context, can execute these complex workflows and suggest relevant

What carries the argument

Bilevel optimization framework that vertically integrates detector geometry, front-end digitization, and reconstruction parameters inside a differentiable full simulation, with an LLM-based agent directing the outer loop.

If this is right

Simultaneous optimization of geometry, digitization, and reconstruction parameters becomes feasible in one run instead of sequential manual stages.
Labor and compute required for exploring detector design spaces are reduced.
Computational checks of first-principles design choices become routine rather than exceptional.
Generic but relevant avenues for further study are identified automatically by the agent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same vertically integrated agent loop could be applied to other detector subsystems such as trackers or muon systems to test broader applicability.
Adding modest experiment-specific context or fine-tuning the agent might allow it to make physics-motivated leaps that the current work explicitly does not claim.
Repeated runs on varied calorimeter layouts would provide a direct test of how sensitive the observed optimizations are to the baseline design.

Load-bearing premise

Current LLM-based reasoning models can carry out complex multi-layer detector optimization workflows and suggest relevant improvements without being supplied additional experiment-specific context.

What would settle it

Running the same agent-driven workflow on the dual-readout calorimeter and finding that it fails to reduce any of the listed parameters or produces no relevant improvement suggestions would show the claimed effectiveness does not hold.

Figures

Figures reproduced from arXiv: 2604.21804 by Julia Gonski, Liangyu Wu, Qibin Liu, Wonyong Chung.

**Figure 1.** Figure 1: Bilevel detector optimization framework diagram, indicating geometry-based outer loop view at source ↗

**Figure 2.** Figure 2: Optimization results for the detector geometry: crystal offset and crystal width, using SNR view at source ↗

**Figure 3.** Figure 3: Optimization results for the detector geometry: crystal offset and crystal width, using SNR view at source ↗

**Figure 4.** Figure 4: Optimization results for the digitization parameters. The view at source ↗

read the original abstract

We present the first implementation of AI agents into the design and optimization of detectors in high-energy physics experiments via a bilevel optimization framework that vertically integrates detector geometry, front-end digitization, and high-level reconstruction algorithm parameters in differentiable full simulations. Using the example of a dual-readout, segmented crystal EM calorimeter with a baseline resolution of $3\%/\sqrt{E}$, we investigate the capabilities and value propositions of AI agents in the identification and reduction of key detector parameters and in the nonlinear traversal of a given detector design's full parameter space. We find that LLM-based reasoning models today, without being given additional experiment-specific context, are able to effectively execute complex workflows and proactively suggest generic but relevant avenues for further study or improvement. Here, we demonstrate an AI agent's ability to use the workflow to simultaneously optimize a representative subset of vertically integrated detector parameters: crystal granularity and length, number of ADC bits and sampling rate, and center-of-gravity hit-clustering radius. We find that effective integration of agents into the complex workflows of frontier areas of research not only significantly reduces labor and compute, but opens up efficient avenues for computational validation of first-principles design choices. While the ability to make autonomous leaps of physics-motivated judgment or insight is not demonstrated in this work, this study defines the current frontier of experimental design methods in high-energy physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a proof-of-concept for wiring LLM agents into a bilevel optimization loop over a full differentiable detector simulation, but the results stay qualitative with no benchmarks or baselines.

read the letter

The paper's main new element is the first reported use of LLM-based agents to run a vertically integrated bilevel optimization across geometry, digitization, and reconstruction parameters inside a differentiable HEP simulation. They take a dual-readout segmented crystal calorimeter with a 3%/√E baseline and let the agent adjust crystal granularity, length, ADC bits, sampling rate, and clustering radius in one workflow. The framing is clean: the agent operates without extra experiment-specific context and can flag generic next steps for study. That integration of agentic reasoning with a full simulation chain is the concrete advance over prior separate uses of differentiable tools or standalone optimizers in detector design. The technical setup itself looks coherent on the description given. Differentiable full simulations are a natural fit for gradient-based search, and routing the agent through the bilevel structure lets it traverse the combined parameter space without manual stitching of stages. This could genuinely lower the labor of exploring design choices once the pipeline is built. The soft spot is the missing evidence. The text states that the agents execute the workflows effectively and suggest relevant improvements, yet it supplies no numbers on iteration count, convergence behavior, final resolution improvement over the baseline, task completion rate, or comparison to non-agent methods such as direct gradient descent or manual tuning. Without those, the assertions about reduced labor and compute rest on a single qualitative run rather than measurable gains. The work is aimed at instrumentation groups already running or building differentiable simulators who want to test AI assistance in the loop. A reader in that niche can extract the workflow structure and try it on their own setup. It deserves peer review because the integration idea is new enough to warrant referee scrutiny, even though the current draft needs added quantitative validation and clearer evaluation criteria before it can be treated as a finished method.

Referee Report

3 major / 3 minor

Summary. The manuscript claims to present the first implementation of AI agents in a bilevel optimization framework for detector design in high-energy physics. It vertically integrates detector geometry, front-end digitization, and high-level reconstruction parameters within differentiable full simulations. Demonstrated on a dual-readout segmented crystal EM calorimeter with baseline resolution 3%/√E, LLM-based agents are shown to execute workflows optimizing crystal granularity and length, ADC bits, sampling rate, and clustering radius, with the agents able to proactively suggest improvements without experiment-specific context; the work concludes that this reduces labor/compute and enables computational validation of design choices, while noting that autonomous physics insight is not demonstrated.

Significance. If the central claims are supported by quantitative evidence, the approach could meaningfully advance phenomenological detector optimization in HEP by enabling efficient traversal of high-dimensional parameter spaces that combine geometry, electronics, and reconstruction. The vertical integration via differentiable simulations and agentic workflows is a novel methodological contribution that could lower the cost of exploring design trade-offs. The absence of benchmarks, however, currently limits the assessed significance to a proof-of-concept demonstration rather than a validated new capability.

major comments (3)

[Abstract] Abstract and results demonstration: the claims that agents 'effectively execute complex workflows' and 'significantly reduces labor and compute' rest on a single qualitative run with no reported quantitative metrics (task success rate, number of iterations, final resolution vs. the stated 3%/√E baseline, or comparison to manual/traditional optimization baselines).
[Methodology] Bilevel optimization description: the framework is stated to operate on externally defined simulation parameters, but no explicit equations, pseudocode, or convergence criteria are supplied showing how the upper-level agent decisions interact with the lower-level differentiable simulation or how gradients are propagated across the full chain (geometry to digitization to reconstruction).
[Results] Parameter optimization results: the simultaneous optimization of granularity, length, ADC bits, sampling rate, and clustering radius is presented without error bars, statistical validation, or sensitivity analysis, making it impossible to assess whether the reported improvements are robust or merely anecdotal.

minor comments (3)

[Abstract] The title and abstract use 'Agentic-AI' without a concise definition or reference; a brief clarification in the introduction would aid readers unfamiliar with the term.
[Results] Specific examples of the agent's 'proactively suggest generic but relevant avenues' would strengthen the narrative; quoting or tabulating one or two agent-generated suggestions would make the claim more concrete.
[Introduction] The manuscript would benefit from a short related-work paragraph situating the bilevel agent approach against prior ML-assisted detector optimization studies.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments highlight important areas where the manuscript can be strengthened to better support its claims as a proof-of-concept. We address each major comment below and commit to revisions that add clarity, formalism, and quantitative support without overstating the current demonstration.

read point-by-point responses

Referee: [Abstract] Abstract and results demonstration: the claims that agents 'effectively execute complex workflows' and 'significantly reduces labor and compute' rest on a single qualitative run with no reported quantitative metrics (task success rate, number of iterations, final resolution vs. the stated 3%/√E baseline, or comparison to manual/traditional optimization baselines).

Authors: We agree that the abstract and results section would benefit from quantitative backing. In the revised manuscript we will report the number of agent iterations, a simple success metric for workflow completion, the final achieved resolution relative to the 3%/√E baseline, and a qualitative but explicit comparison of labor and compute effort versus a manual optimization workflow. These additions will be placed in both the abstract and a new subsection of the results. revision: yes
Referee: [Methodology] Bilevel optimization description: the framework is stated to operate on externally defined simulation parameters, but no explicit equations, pseudocode, or convergence criteria are supplied showing how the upper-level agent decisions interact with the lower-level differentiable simulation or how gradients are propagated across the full chain (geometry to digitization to reconstruction).

Authors: The referee is correct that the current text lacks formal specification. We will add a dedicated subsection containing (i) the bilevel optimization objective in mathematical form, (ii) pseudocode for the agent–simulation loop, and (iii) a description of gradient flow through the vertically integrated chain. Convergence criteria used in the demonstration will also be stated explicitly. revision: yes
Referee: [Results] Parameter optimization results: the simultaneous optimization of granularity, length, ADC bits, sampling rate, and clustering radius is presented without error bars, statistical validation, or sensitivity analysis, making it impossible to assess whether the reported improvements are robust or merely anecdotal.

Authors: We acknowledge that the presented results are from a single illustrative run. The revised results section will include error bars derived from repeated simulations where computationally feasible, a brief sensitivity study on the most influential parameters, and an explicit statement that the work is intended as a proof-of-concept rather than a statistically exhaustive benchmark. Full statistical validation across many random seeds will be noted as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: framework uses external parameters and qualitative demonstration

full rationale

The paper presents a bilevel optimization framework that vertically integrates detector geometry, digitization, and reconstruction parameters inside differentiable simulations, using an example dual-readout calorimeter with a stated baseline resolution of 3%/√E. All optimized quantities (granularity, length, ADC bits, sampling rate, clustering radius) are externally defined inputs to the simulation rather than quantities derived or fitted inside the same equations. No predictions are renamed as outputs of a fit, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled through prior work. The demonstration is described as a qualitative exploration of agent workflows; the central claim therefore remains self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that differentiable full simulations faithfully capture detector response across geometry, digitization, and reconstruction layers; no new physical entities are introduced, and the work adds a new workflow rather than new fitted constants.

axioms (1)

domain assumption Differentiable full simulations can accurately integrate detector geometry, front-end digitization, and high-level reconstruction algorithm parameters.
This assumption is required for the bilevel optimization to produce physically meaningful results and is invoked throughout the described framework.

pith-pipeline@v0.9.0 · 5556 in / 1252 out tokens · 32507 ms · 2026-05-08T13:00:06.339771+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages

[1]

Butler, R

Joel N. Butler, R. Sekhar Chivukula, André de Gouvêa, Tao Han, Young-Kee Kim, Priscilla Cushman, Glennys R. Farrar, Yury G. Kolomensky, Sergei Nagaitsev, Nicolás Yunes, Stephen Gourlay, Tor Raubenheimer, Vladimir Shiltsev, Kétévi A. Assamagan, Breese Quinn, V . Daniel Elvira, Steven Gottlieb, Benjamin Nachman, Aaron S. Chou, Marcelle Soares-Santos, Tim M....

2021
[2]

Hitoshi Murayama, Shoji Asai, Karsten Heeger, Amalia Ballarino, Tulika Bose, Kyle Cranmer, Francis-Yan Cyr-Racine, Sarah Demers, Cameron Geddes, Yuri Gershtein, Beate Heinemann, JoAnne Hewett, Patrick Huber, Kendall Mahn, Rachel Mandelbaum, Jelena Maricic, Petra Merkel, Christopher Monahan, Peter Onyisi, Mark Palmer, Tor Raubenheimer, Mayly Sanchez, Richa...

2023
[3]

Technical report, Monte Verità/Ascona, Switzerland, 2025

The European Strategy for Particle Physics: 2026 Update - Recommendations by the European Strategy Group. Technical report, Monte Verità/Ascona, Switzerland, 2025

2026
[4]

Machine learning in high energy physics community white paper, 2019

Kim Albertsson et al. Machine learning in high energy physics community white paper, 2019

2019
[5]

Building an ai-native research ecosystem for experimental particle physics: A community vision, 2026

Thea Klaeboe Aarrestad et al. Building an ai-native research ecosystem for experimental particle physics: A community vision, 2026

2026
[6]

Smart pixel sensors: towards on-sensor filtering of pixel clusters with deep learning.Machine Learning: Science and Technology, 5(3):035047, aug 2024

Jieun Yoo, Jennet Dickinson, Morris Swartz, Giuseppe Di Guglielmo, Alice Bean, Douglas Berry, Manuel Blanco Valentin, Karri DiPetrillo, Farah Fahim, Lindsey Gray, James Hirschauer, Shruti R Kulkarni, Ron Lipton, Petar Maksimovic, Corrinne Mills, Mark S Neubauer, Benjamin 7 Parpillon, Gauri Pradhan, Chinar Syal, Nhan Tran, Dahai Wen, and Aaron Young. Smart...

2024
[7]

Yilmaz, L

D. Yilmaz, L. Wu, J. Gonski, D. Rankin, and C. Herwig. Edge machine learning for cluster counting in next-generation drift chambers. InProceedings of the Machine Learning for the Physical Sciences Workshop at NeurIPS 2025, 2025. arXiv:2511.10540

work page arXiv 2025
[8]

Differentiable full detector simulation of a projective dual-readout crystal electromagnetic calorimeter with longitudinal segmentation and precision timing, 2024

Wonyong Chung. Differentiable full detector simulation of a projective dual-readout crystal electromagnetic calorimeter with longitudinal segmentation and precision timing, 2024

2024
[9]

Strong, Mia Tosi, Andrey Ustyuzhanin, Pietro Vischia, and Hevjin Yarar

Atılım Güne¸ s Baydin, Kyle Cranmer, Pablo de Castro Manzano, Christophe Delaere, Denis Derkach, Julien Donini, Tommaso Dorigo, Andrea Giammanco, Jan Kieseler, Lukas Layer, Gilles Louppe, Fedor Ratnikov, Giles C. Strong, Mia Tosi, Andrey Ustyuzhanin, Pietro Vischia, and Hevjin Yarar. Toward machine learning optimization of experimental design.Nuclear Phys...

2021
[10]

Mode: Machine-learning optimized design of experiments

MODE Collaboration. Mode: Machine-learning optimized design of experiments. https: //mode-collaboration.github.io/, 2026

2026
[11]

Belén Barreiro, Anastasios Belias, Alexey Boldyrev, Florian Bury, Susana Cebrian, Alexander Demin, Jennet Dickinson, Julien Donini, Tommaso Dorigo, Michele Doro, Nicolas R

Max Aehle, Lorenzo Arsini, R. Belén Barreiro, Anastasios Belias, Alexey Boldyrev, Florian Bury, Susana Cebrian, Alexander Demin, Jennet Dickinson, Julien Donini, Tommaso Dorigo, Michele Doro, Nicolas R. Gauger, Andrea Giammanco, Lindsey Gray, Borja S. González, Verena Kain, Jan Kieseler, Lisa Kusch, Marcus Liwicki, Gernot Maier, Federico Nardi, Fedor Ratn...

2025
[12]

Physics instrument design with reinforce- ment learning, 2024

Shah Rukh Qasim, Patrick Owen, and Nicola Serra. Physics instrument design with reinforce- ment learning, 2024

2024
[13]

Synthetic training and representation bridging in reconstruction domains, 2025

Wonyong Chung. Synthetic training and representation bridging in reconstruction domains, 2025

2025
[14]

Gauger, Enrico Lupi, Federico Nardi, Xuan Tung Nguyen, Fredrik Sandin, Joseph Willmore, and Pietro Vischia

Kylian Schmidt, Nikhil Kota, Jan Kieseler, Andrea De Vita, Markus Klute, Abhishek, Max Aehle, Muhammad Awais, Alessandro Breccia, Riccardo Carroccio, Long Chen, Tommaso Dorigo, Nicolas R. Gauger, Enrico Lupi, Federico Nardi, Xuan Tung Nguyen, Fredrik Sandin, Joseph Willmore, and Pietro Vischia. End-to-end detector optimization with diffusion models: A cas...

2025
[15]

Diefenthaler, C

M. Diefenthaler, C. Fanelli, L. O. Gerlach, W. Guan, T. Horn, A. Jentsch, M. Lin, K. Nagai, H. Nayak, C. Pecar, K. Suresh, A. V ossen, T. Wang, and T. Wenaus. AI-Assisted Detector Design for the EIC (AID 2E). InProceedings of the AI4EIC 2023 Workshop, 2023. https: //arxiv.org/abs/2405.16279

work page arXiv 2023
[16]

Schwartz

Matthew D. Schwartz. Resummation of the c-parameter sudakov shoulder using effective field theory, 2026

2026
[17]

Moreno, Samuel Bright-Thonney, Andrzej Novak, Dolores Garcia, and Philip Harris

Eric A. Moreno, Samuel Bright-Thonney, Andrzej Novak, Dolores Garcia, and Philip Harris. AI Agents Can Already Autonomously Perform Experimental High Energy Physics. 3 2026

2026
[18]

bilevel_det_opt: Bilevel optimization of detector geometry and reconstruction algorithm parameters

Wonyong Chung. bilevel_det_opt: Bilevel optimization of detector geometry and reconstruction algorithm parameters. https://github.com/wonyongc/bilevel_det_opt, 2026. Ac- cessed: 2026-04-16

2026
[19]

DD4hep: A Detector Description Toolkit for High Energy Physics Experiments.Journal of Physics: Conference Series, 513(2):022010, jun 2014

M Frank, F Gaede, C Grefe, and P Mato. DD4hep: A Detector Description Toolkit for High Energy Physics Experiments.Journal of Physics: Conference Series, 513(2):022010, jun 2014

2014
[20]

Lucchini, Wonyong Chung, Sarah C

Marco T. Lucchini, Wonyong Chung, Sarah C. Eno, Yihui Lai, Lorenzo Lucchini, Minh-Thi Nguyen, and Christopher G. Tully. New perspectives on segmented crystal calorimeters for future colliders.JINST, 15(11):P11005, 2020. 8

2020
[21]

Dual-readout calorimetry.Rev

Sehwook Lee, Michele Livan, and Richard Wigmans. Dual-readout calorimetry.Rev. Mod. Phys., 90:025002, Apr 2018

2018
[22]

Hirosky, T

R. Hirosky, T. Anderson, G. Cummings, M. Dubnowski, C. Guinto-Brody, Y . Guo, A. Ledovskoy, D. Levin, C. Madrid, C. Martin, and J. Zhu. Dual-readout calorimetry with homogeneous crystals. InProceedings of CALOR2024, EPJ Web of Conferences, 2024. arXiv:2408.11973

work page arXiv 2024
[23]

S. Eno, L. Wu, M. Y . Aamir, S. V . Chekanov, S. Nabili, and C. Palmer. On the resolution of dual readout calorimeters.Nucl. Instrum. Meth. A, 1083:171080, 2026

2026
[24]

Agostinelli, J

S. Agostinelli, J. Allison, K. Amako, J. Apostolakis, H. Araujo, P. Arce, M. Asai, D. Axen, S. Banerjee, G. Barrand, F. Behner, L. Bellagamba, J. Boudreau, L. Broglia, A. Brunengo, H. Burkhardt, S. Chauvie, J. Chuma, R. Chytracek, G. Cooperman, G. Cosmo, P. Degtyarenko, A. Dell’Acqua, G. Depaola, D. Dietrich, R. Enami, A. Feliciello, C. Ferguson, H. Fesef...

2003
[25]

Scifi: A safe, lightweight, user-friendly, and fully autonomous agentic ai workflow for scientific applications, 2026

Qibin Liu and Julia Gonski. Scifi: A safe, lightweight, user-friendly, and fully autonomous agentic ai workflow for scientific applications, 2026. 9

2026