Recognition: unknown
Dr.Sai: An agentic AI for real-world physics analysis at BESIII
Pith reviewed 2026-05-08 09:17 UTC · model grok-4.3
The pith
Dr.Sai, an LLM-powered multi-agent system, autonomously translates natural language into complete physics analysis workflows and reproduces established J/psi branching fraction measurements at BESIII.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dr.Sai is an LLM-powered multi-agent system that translates natural language into rigorous physics workflows. As validation, Dr.Sai performed large-scale re-measurements of ten J/psi decay branching fractions without manual coding. It successfully navigated the real BESIII computing environment and produced results matching established benchmarks. The article details Dr.Sai's architecture, the validation results, and performance evaluation.
What carries the argument
Dr.Sai, the LLM-powered multi-agent system that interprets natural language tasks, generates code for HEP tools such as ROOT and BOSS, and executes the full workflow in the target computing environment.
If this is right
- Large-scale systematic scans of multiple decay channels become feasible without a proportional increase in human labor.
- Analysis results can be reproduced and cross-checked more consistently by directing the same agent with identical instructions.
- The interval between data taking and final physics results can be reduced by automating the workflow generation and execution steps.
- The same architecture supplies a template for autonomous analysis pipelines in other data-heavy domains.
Where Pith is reading between the lines
- Similar agent systems could be deployed at other particle physics facilities that share comparable data processing chains.
- Adding iterative feedback from partial results might allow the agent to refine selection criteria or fit procedures on its own.
- The method could accelerate studies of low-rate processes that are currently impractical to analyze at scale by hand.
- Integration with version-controlled analysis repositories could let the agent produce auditable, reusable workflows as a standard output.
Load-bearing premise
Large language models can accurately interpret complex scientific tasks, generate error-free code for specialized HEP tools, and manage real computing-environment quirks without undetected biases or failures.
What would settle it
Applying Dr.Sai to an independent set of J/psi decay channels or a different experiment and obtaining branching fraction values that systematically deviate from independent measurements beyond statistical uncertainties.
read the original abstract
High Energy Physics (HEP) experiments like BESIII produce petabyte-scale data. Extracting physics results requires complex workflows (simulation, reconstruction, statistical analysis, etc.) that traditionally take experts months or years. Current manual methods are labor-intensive, prone to bias, and limit large-scale systematic scans. As data grows, this paradigm slows discovery. Large Language Models (LLMs) offer a solution. Their natural language understanding and code generation capabilities allow them to interpret scientific tasks and integrate with HEP tools (e.g., ROOT, BOSS) to act as an "AI partner" for autonomous analysis. We present Dr.Sai, an LLM-powered multi-agent system that translates natural language into rigorous physics workflows. As validation, Dr.Sai performed large-scale re-measurements of ten J/psi decay branching fractions - without manual coding. It successfully navigated the real BESIII computing environment and produced results matching established benchmarks. The article details Dr.Sai's architecture, the validation results, and performance evaluation. This work provides a blueprint for autonomous discovery, with relevance to other data-intensive fields like astronomy and genomics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Dr.Sai, an LLM-powered multi-agent system intended to automate complex high-energy physics analysis workflows at BESIII. The central claim is that Dr.Sai translates natural-language task descriptions into complete, executable pipelines (including BOSS/ROOT jobs for simulation, reconstruction, efficiency corrections, background subtraction, and fits) and successfully re-measured branching fractions for ten J/psi decay channels in the real BESIII computing environment, yielding results that match established benchmarks without any manual coding or human intervention.
Significance. If the autonomy and correctness claims hold with full reproducibility, the work would be significant for HEP by showing a practical route to reduce the months-to-years effort required for standard analyses, enable broader systematic scans, and minimize workflow biases. The reported integration with live experimental infrastructure (rather than toy environments) is a concrete strength. The approach could also serve as a template for agentic systems in other data-intensive domains. However, the absence of released artifacts and quantitative validation metrics substantially limits the immediate scientific impact and verifiability of these contributions.
major comments (2)
- [Validation results section] Validation results section: The manuscript states that Dr.Sai produced branching-fraction values matching established benchmarks for ten channels, but supplies no quantitative agreement metrics (e.g., differences from PDG values, combined uncertainties, or fit-quality measures), no statistics on attempt success/failure rates, and no description of how LLM hallucinations or code errors were detected and corrected. These details are load-bearing for assessing whether the system reliably navigated real-data complexities.
- [Reproducibility and system description] Reproducibility and system description: The claim of fully autonomous execution without manual coding is central, yet the paper releases neither the natural-language prompts, the LLM-generated scripts and job files, the execution logs, nor the final output files for the ten channels. Without these artifacts, independent verification of the workflow autonomy and absence of hidden human guidance is impossible.
minor comments (1)
- [Abstract and performance-evaluation section] The abstract and performance-evaluation section would benefit from explicit definitions of the success criteria used to declare a workflow 'successful' and from clearer notation distinguishing agent-generated code from any template or helper scripts.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We agree that additional details on validation metrics and reproducibility are crucial for establishing the reliability and verifiability of Dr.Sai. Below we provide point-by-point responses and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Validation results section] The manuscript states that Dr.Sai produced branching-fraction values matching established benchmarks for ten channels, but supplies no quantitative agreement metrics (e.g., differences from PDG values, combined uncertainties, or fit-quality measures), no statistics on attempt success/failure rates, and no description of how LLM hallucinations or code errors were detected and corrected. These details are load-bearing for assessing whether the system reliably navigated real-data complexities.
Authors: We acknowledge that the current version of the manuscript lacks explicit quantitative metrics and detailed error-handling descriptions, which are important for a rigorous assessment. In the revised manuscript, we will add a dedicated subsection in the validation results with a table of measured branching fractions versus PDG values, including relative differences, uncertainties, and goodness-of-fit measures such as chi-squared per degree of freedom. We will also report success rates from multiple independent runs of the agent system and describe the built-in verification mechanisms, including cross-checks by specialized agents for code correctness and consistency with physics expectations. This will strengthen the evidence for autonomous navigation of real-data complexities without misrepresenting the original claims. revision: yes
-
Referee: [Reproducibility and system description] The claim of fully autonomous execution without manual coding is central, yet the paper releases neither the natural-language prompts, the LLM-generated scripts and job files, the execution logs, nor the final output files for the ten channels. Without these artifacts, independent verification of the workflow autonomy and absence of hidden human guidance is impossible.
Authors: We agree that releasing the artifacts is essential for full reproducibility and independent verification. Although the initial submission emphasized the system description and results to highlight the conceptual advance, we will include the natural-language prompts, sample LLM-generated scripts, execution logs (anonymized where necessary), and output files as supplementary material or host them in a public GitHub repository linked in the revised manuscript. This will enable others to inspect the autonomy of the process. We note that the multi-agent architecture includes logging of all steps, which facilitates this release. revision: yes
Circularity Check
No circularity: empirical validation of agentic system
full rationale
The paper describes an LLM-based multi-agent architecture (Dr.Sai) that translates natural-language tasks into BESIII workflows and validates it by re-measuring ten J/ψ branching fractions, reporting agreement with PDG benchmarks. No derivation chain, equations, fitted parameters, or first-principles predictions exist that could reduce to inputs by construction. The central claim is an empirical demonstration of autonomous execution in a real computing environment; any self-citations (if present) support tool integration or prior LLM capabilities but are not load-bearing for the reported results. The validation is falsifiable against external benchmarks and does not rely on self-definition or renaming of known results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents guided by multi-agent orchestration can reliably translate natural-language physics tasks into correct, executable code for domain tools such as ROOT and BOSS
invented entities (1)
-
Dr.Sai multi-agent system
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ablikim, M.,et al.: Design and Construction of the BESIII Detector. Nucl. Instrum. Meth. A614, 345–399 (2010) https://doi.org/10.1016/j.nima.2009.12. 050 arXiv:0911.4960 [physics.ins-det]
-
[2]
Ablikim, M.,et al.: Future Physics Programme of BESIII. Chin. Phys. C44(4), 040001 (2020) https://doi.org/10.1088/1674-1137/44/4/040001 arXiv:1912.05983 [hep-ex]
-
[3]
Brun, R., Rademakers, F.: ROOT: An object oriented data analysis frame- work. Nucl. Instrum. Meth. A389, 81–86 (1997) https://doi.org/10.1016/ S0168-9002(97)00048-X
1997
-
[4]
Zou, J.,et al.: Offline data processing system of the BESIII experiment. Eur. Phys. J. C84(9), 937 (2024) https://doi.org/10.1140/epjc/s10052-024-13241-3
-
[5]
Lepton flavor (universality) violation studies at CMS
Zhang, Z., et al.: Dr. sai: Physical analysis agents based on llms for besiii experi- ment and exploration of future ai scientist. In: 24th International Conference on High Energy Physics (ICHEP 2024), Prague, Czech Republic. Presented by Z. Zhang on behalf of Dr.Sai working group. https://indico.cern.ch/event/1291157/ contributions/5889603/
-
[6]
Li, K., Liu, B., Mellado, B., Yuan, C.-Z., Zhang, Z.: AI agents, language, deep learning, and the next revolution in science. Front. Phys. (Beijing)21(9), 096401 (2026) https://doi.org/10.15302/frontphys.2026.096401 arXiv:2603.07940 [hep-ex]
-
[7]
Han, S., et al.: LLM Multi-Agent Systems: Challenges and Open Problems (2025). https://arxiv.org/abs/2402.03578
-
[8]
The Rise and Potential of Large Language Model Based Agents: A Survey
Xi, Z., et al.: The Rise and Potential of Large Language Model Based Agents: A Survey (2023). https://arxiv.org/abs/2309.07864
work page internal anchor Pith review arXiv 2023
-
[9]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, P., et al.: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2021). https://arxiv.org/abs/2005.11401
work page internal anchor Pith review arXiv 2021
-
[10]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu, Q., et al.: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (2023). https://arxiv.org/abs/2308.08155
work page internal anchor Pith review arXiv 2023
-
[11]
https://arxiv.org/abs/2410.08328
Christakopoulou, K., Mourad, S., Matari´ c, M.: Agents Thinking Fast and Slow: A Talker-Reasoner Architecture (2024). https://arxiv.org/abs/2410.08328
-
[12]
https://htcondor.org
HTCondor. https://htcondor.org
-
[13]
https://arxiv.org/abs/2505
Yang, A., et al.: Qwen3 Technical Report (2025). https://arxiv.org/abs/2505. 09388 36
2025
-
[14]
DeepSeek-AI: DeepSeek-V3 Technical Report (2024). https://arxiv.org/abs/ 2412.19437
work page internal anchor Pith review arXiv 2024
-
[15]
Liu, J.: LlamaIndex. https://doi.org/10.5281/zenodo.1234 . https://github.com/ jerryjliu/llama index
-
[16]
https://qdrant.tech/
Qdrant. https://qdrant.tech/
-
[17]
In: Proceedings of the 42nd Inter- national Conference on High Energy Physics (ICHEP 2024)
Beringer, J.: New ways to access pdg data. In: Proceedings of the 42nd Inter- national Conference on High Energy Physics (ICHEP 2024). https://example. com/proceedings-ichep2024, Prague, Czech Republic (2024). https://doi.org/10. 22323/1.476.1023 . Accessed: 2025-11-06
2024
-
[18]
Python in HEP
Beringer, J., Kramer, M.: The new pdg python api. In: Proceedings of PyHEP 2024 - “Python in HEP” Users Workshop. https://example.com/ recording-pyhep2024, Virtual (2024). Recording of the workshop presentation
2024
-
[19]
In: Proceedings of the HADRON 2023 Conference
Beringer, J.: Programmatic access to pdg data. In: Proceedings of the HADRON 2023 Conference. https://example.com/proceedings-hadron2023, Genova, Italy (2023). https://doi.org/10.1393/ncc/i2024-24206-9 . Accessed: 2025-11-06
-
[20]
Particle: A pythonic interface to the Particle Data Group (PDG) data. PyPI. Accessed: 2026-04-23 (2026). https://pypi.org/project/particle/
2026
-
[21]
https://openwebui.com
Open WebIO. https://openwebui.com
-
[22]
Guo, Y.P., Yuan, C.Z.: Impact of the interference between the resonance and continuum amplitudes on vector quarkonia decay branching fraction measure- ments. Phys. Rev. D105(11), 114001 (2022) https://doi.org/10.1103/PhysRevD. 105.114001 arXiv:2203.00244 [hep-ph]
-
[23]
Ablikim, M.,et al.: Precise measurement of the branching fractions of J/ψ→Λ¯π+Σ-+c.c. and J/ψ→Λ¯π-Σ++c.c. Phys. Rev. D108(11), 112012 (2023) https://doi.org/10.1103/PhysRevD.108.112012 arXiv:2306.10319 [hep-ex]
-
[24]
Appelquist, T., Politzer, H.D.: Orthocharmonium and e+ e- Annihilation. Phys. Rev. Lett.34, 43 (1975) https://doi.org/10.1103/PhysRevLett.34.43
-
[25]
De Rujula, A., Glashow, S.L.: Is Bound Charm Found? Phys. Rev. Lett.34, 46–49 (1975) https://doi.org/10.1103/PhysRevLett.34.46
-
[26]
Brambilla, N.,et al.: Heavy Quarkonium: Progress, Puzzles, and Opportunities. Eur. Phys. J. C71, 1534 (2011) https://doi.org/10.1140/epjc/s10052-010-1534-9 arXiv:1010.5827 [hep-ph]
-
[27]
Ablikim, M.,et al.: Determination of the number ofψ(3686) events taken at BESIII*. Chin. Phys. C48(9), 093001 (2024) https://doi.org/10.1088/1674-1137/ ad595b arXiv:2403.06766 [hep-ex] 37
-
[28]
Agostinelli, S.,et al.: GEANT4 - A Simulation Toolkit. Nucl. Instrum. Meth. A 506, 250–303 (2003) https://doi.org/10.1016/S0168-9002(03)01368-8
-
[29]
Jadach, S., Ward, B.F.L., Was, Z.: The Precision Monte Carlo event generator K K for two fermion final states in e+ e- collisions. Comput. Phys. Commun. 130, 260–325 (2000) https://doi.org/10.1016/S0010-4655(00)00048-5 arXiv:hep- ph/9912214
- [30]
-
[31]
Lange, D.J.: The EvtGen particle decay simulation package. Nucl. Instrum. Meth. A462, 152–155 (2001) https://doi.org/10.1016/S0168-9002(01)00089-4
-
[32]
Ping, R.-G.: Event generators at BESIII. Chin. Phys. C32, 599 (2008) https: //doi.org/10.1088/1674-1137/32/8/001
-
[33]
Navas, S.,et al.: Review of particle physics. Phys. Rev. D110(3), 030001 (2024) https://doi.org/10.1103/PhysRevD.110.030001
-
[34]
Chen, J.C., Huang, G.S., Qi, X.R., Zhang, D.H., Zhu, Y.S.: Event generator for J / psi and psi (2S) decay. Phys. Rev. D62, 034003 (2000) https://doi.org/10. 1103/PhysRevD.62.034003
2000
-
[35]
Yang, R.-L., Ping, R.-G., Chen, H.: Tuning and Validation of the Lundcharm Model withJ/ψDecays. Chin. Phys. Lett.31, 061301 (2014) https://doi.org/10. 1088/0256-307X/31/6/061301
2014
-
[36]
Barberio, E., Eijk, B., Was, Z.: PHOTOS: A Universal Monte Carlo for QED radiative corrections in decays. Comput. Phys. Commun.66, 115–128 (1991) https://doi.org/10.1016/0010-4655(91)90012-A
-
[37]
Yuan, W.-L., Ai, X.-C., Ji, X.-B., Chen, S.-J., Zhang, Y., Wu, L.-H., Wang, L.- L., Yuan, Y.: Study of tracking efficiency and its systematic uncertainty from J/ψ→p pπ+π− at BESIII. Chin. Phys. C40(2), 026201 (2016) https://doi.org/ 10.1088/1674-1137/40/2/026201 arXiv:1507.03453 [hep-ex]
- [38]
-
[39]
Chai, X., Wang, M., Ji, X., Sun, S., Wang, D.: Studies of the tracking and iden- tification efficiencies of electrons and positrons at BESIII (2025) https://doi.org/ 10.1007/s41605-025-00609-6 arXiv:2509.09963 [hep-ex]
-
[40]
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Zhupu-AI: GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models 38 (2025). https://arxiv.org/abs/2508.06471
work page internal anchor Pith review arXiv 2025
-
[41]
Nature645(8081), 633–638 (2025) https://doi.org/10.1038/ s41586-025-09422-z
DeepSeek-AI: Deepseek-r1 incentivizes reasoning in llms through reinforce- ment learning. Nature645(8081), 633–638 (2025) https://doi.org/10.1038/ s41586-025-09422-z
2025
-
[42]
OpenAI: GPT-4o System Card (2024). https://arxiv.org/abs/2410.21276 39
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.