Large Language Model-Assisted Framework for BSM Model Building
Pith reviewed 2026-06-26 13:58 UTC · model grok-4.3
The pith
The bsm_agent framework builds BSM models automatically from natural-language descriptions of new fields, with an LLM handling only the interface and a Python backend executing all symbolic calculations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Starting from the SM field content and a user-specified set of additional scalars and/or fermions, the package constructs renormalizable Lagrangian, performs gauge-anomaly checks, expands operators into component fields, and derives electroweak symmetry breaking stationary conditions and tree-level mass matrices. All of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface, eliminating the need for manual model construction. The symbolic calculations are performed entirely by the Python backend to ensure the correctness and reproducibility of the physics results; the LLM is used only as an orchestration laye
What carries the argument
The LLM orchestration layer that interprets natural-language requests, manages confirmation steps for ambiguous inputs, triggers backend tools, and formats report-ready summaries, separated from the deterministic Python backend that executes all symbolic calculations.
If this is right
- Quantum numbers of new scalars and fermions are supplied conversationally rather than through manual code entry.
- Renormalizable Lagrangian construction, gauge-anomaly checks, operator expansion, and mass-matrix derivation all occur without further user coding after the initial description.
- The same backend can be driven by local Ollama models, self-hosted servers, or commercial APIs while producing identical physics output.
- Stationary conditions and tree-level mass matrices are generated in report-ready form for immediate use in further calculations.
Where Pith is reading between the lines
- The same split between conversational control and deterministic computation could be reused in other symbolic physics packages to reduce manual setup time.
- Confirmation steps already built into the LLM layer provide a practical way to catch interpretation mistakes before the backend runs.
- Extending the backend with additional modules such as one-loop corrections would immediately make those capabilities available through the same natural-language interface.
Load-bearing premise
The LLM correctly and unambiguously translates the user's natural-language description of quantum numbers into the precise inputs required by the backend without introducing parsing or interpretation errors.
What would settle it
An input phrase whose quantum numbers are misread by the LLM, producing a Lagrangian missing an interaction term or failing an anomaly check that the backend would otherwise catch.
Figures
read the original abstract
Recent advances in artificial intelligence (AI), particularly large language models (LLMs), have created new opportunities for natural-language interaction with scientific software, but reliable theoretical model building still requires deterministic symbolic calculations. We present \texttt{bsm_agent}, an open-source symbolic framework for beyond the Standard Model (BSM) model building that combines a deterministic physics backend with an LLM chat interface. Starting from the SM field content and a user-specified set of additional scalars and/or fermions, the package constructs renormalizable Lagrangian, performs gauge-anomaly checks, expands operators into component fields, and derives electroweak symmetry breaking stationary conditions and tree-level mass matrices. The key novelty of the framework is that all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface, eliminating the need for manual model construction. The symbolic calculations are performed entirely by the Python backend to ensure the correctness and reproducibility of the physics results; the LLM is used only as an orchestration layer that interprets natural-language requests, manages confirmation steps for ambiguous inputs, triggers backend tools, and formats report-ready summaries. The package supports three provider classes: local Ollama inference, remote self-hosted model servers accessed through the implemented remote provider interface, and commercial hosted APIs via OpenAI and Anthropic. This separation between conversational control and deterministic computation preserves reproducibility while making interactive BSM model construction substantially more convenient.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes bsm_agent, an open-source framework that pairs an LLM chat interface for natural-language specification of BSM field quantum numbers with a deterministic Python backend. The backend automatically constructs renormalizable Lagrangians, performs gauge-anomaly checks, expands operators to component fields, and derives EWSB conditions and tree-level mass matrices starting from the SM plus user-specified scalars/fermions. The LLM acts solely as an orchestration layer for input interpretation, ambiguity confirmation, tool triggering, and report formatting, while all symbolic physics is handled in Python for reproducibility; multiple LLM providers (local, self-hosted, commercial) are supported.
Significance. If the LLM-to-backend translation proves reliable, the framework would lower the barrier to exploratory BSM model construction while preserving reproducibility. The explicit separation of conversational control from deterministic computation is a clear design strength. However, the absence of any concrete examples, parsing-accuracy metrics, or validation against known models means the practical significance remains unestablished.
major comments (2)
- [Abstract] Abstract: the central claim that 'all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface' and that 'the LLM is used only as an orchestration layer' is unsupported by any validation data, test cases, or error metrics. Without quantitative assessment of parsing accuracy for quantum numbers (representations, hypercharges, etc.), the reproducibility guarantee cannot be evaluated.
- [Abstract] Abstract (and implied implementation description): the architecture assumes the LLM will not produce confident but incorrect mappings (e.g., doublet vs. singlet or fractional hypercharge errors) that the backend then executes deterministically. No mechanism beyond 'confirmation steps for ambiguous inputs' is described to catch such errors, and no accuracy benchmarks are referenced.
Simulated Author's Rebuttal
We thank the referee for their detailed review and for highlighting the importance of empirical validation for the LLM orchestration layer. The comments correctly identify that the submitted manuscript provides limited quantitative support for the reliability claims. We address each point below and will revise the manuscript to incorporate additional test cases, accuracy metrics, and workflow clarifications as described.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface' and that 'the LLM is used only as an orchestration layer' is unsupported by any validation data, test cases, or error metrics. Without quantitative assessment of parsing accuracy for quantum numbers (representations, hypercharges, etc.), the reproducibility guarantee cannot be evaluated.
Authors: We agree that the abstract and current manuscript lack explicit quantitative validation of the LLM parsing step. The full text describes the separation of concerns and provides illustrative usage, but does not report systematic accuracy metrics or error rates. In the revision we will add a new section presenting a benchmark suite of quantum-number specifications (including representations, hypercharges, and multiplicities), measured parsing success rates across the supported LLM providers, and direct comparisons of the resulting Lagrangians against manually constructed reference models. This will allow readers to evaluate the reproducibility claim quantitatively. revision: yes
-
Referee: [Abstract] Abstract (and implied implementation description): the architecture assumes the LLM will not produce confident but incorrect mappings (e.g., doublet vs. singlet or fractional hypercharge errors) that the backend then executes deterministically. No mechanism beyond 'confirmation steps for ambiguous inputs' is described to catch such errors, and no accuracy benchmarks are referenced.
Authors: The current description emphasizes user confirmation of the parsed quantum numbers prior to backend execution, which is intended to intercept mis-mappings before any symbolic computation occurs. However, we acknowledge that the manuscript does not detail the exact confirmation interface, does not quantify how often such errors arise, and provides no benchmark data on their detection rate. In the revision we will expand the implementation section with a step-by-step description of the confirmation workflow, include the benchmark results mentioned above (which will report both raw parsing error rates and post-confirmation residual error rates), and clarify that the deterministic backend operates exclusively on the user-approved field content. revision: yes
Circularity Check
No circularity: framework description with deterministic backend
full rationale
The paper describes a software package (bsm_agent) that uses an LLM solely as an orchestration layer for natural-language input while delegating all symbolic calculations (Lagrangian construction, anomaly checks, mass matrices) to a deterministic Python backend. No derivations, equations, fitted parameters, predictions, or self-citations appear in the text. The central claim reduces to a description of implemented functionality rather than any mathematical reduction to its own inputs. This is a standard non-finding for tool/framework papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Python backend correctly implements the Standard Model gauge group, field content, and rules for renormalizable operators and anomaly cancellation.
Reference graph
Works this paper leans on
-
[1]
S. L. Glashow,Partial Symmetries of Weak Interactions,Nucl. Phys.22(1961) 579
1961
-
[2]
Weinberg,A Model of Leptons,Phys
S. Weinberg,A Model of Leptons,Phys. Rev. Lett.19(1967) 1264
1967
-
[3]
Salam,Weak and Electromagnetic Interactions,Conf
A. Salam,Weak and Electromagnetic Interactions,Conf. Proc. C680519(1968) 367
1968
-
[4]
W. X. Zhao, K. Zhou, J. Li, T. Tang, Z. Dong, Y. Hou et al.,A survey of large language models,Frontiers of Computer Science20(2026) 2012627
2026
-
[5]
Hajkowicz, C
S. Hajkowicz, C. Sanderson, S. Karimi, A. Bratanova and C. Naughtin,Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960-2021,Technology in Society74(2023) 102260
1960
-
[6]
Zhang, L
X. Zhang, L. Wang, J. Helwig, Y. Luo, C. Fu, Y. Xie et al.,Artificial intelligence for science in quantum, atomistic, and continuum systems,Foundations and Trends®in Machine Learning18(2025) 385. 47
2025
- [7]
- [8]
-
[9]
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang et al.,A survey on large language model based autonomous agents,Frontiers of Computer Science18(2024) 186345
2024
-
[10]
K. Li, B. Liu, B. Mellado, C.-Z. Yuan and Z. Zhang,AI agents, language, deep learning, and the next revolution in science,Front. Phys. (Beijing)21(2026) 096401 [arXiv:2603.07940]
arXiv 2026
-
[11]
Millison et al.,State machine structured agents for physical science reasoning, inAAAI Spring Symposium Series, vol
J. Millison et al.,State machine structured agents for physical science reasoning, inAAAI Spring Symposium Series, vol. 20, pp. 310–317, 2026
2026
-
[12]
T. Plehn, D. Schiller and N. Schmal,Madagents,arXiv preprint arXiv:2601.21015(2026) [arXiv:2601.21015]
Pith/arXiv arXiv 2026
-
[13]
E. A. Moreno, S. Bright-Thonney, A. Novak, D. Garcia and P. Harris,Ai agents can already autonomously perform experimental high energy physics,arXiv preprint arXiv:2603.20179 (2026) [arXiv:2603.20179]
Pith/arXiv arXiv 2026
- [15]
-
[16]
S. Qiu, Z. Cai, J. Wei, Z. Li, Y. Yin, Q.-H. Cao et al.,An End-to-end Architecture for Collider Physics and Beyond,arXiv:2603.14553
-
[17]
P. Agrawal, N. Craig, A. Madden and I. V. Lombera,The FERMIACC: Agents for Particle Theory,arXiv:2603.22538
- [18]
-
[19]
D. A. Faroughy, S. Palacios Schweitzer, I. Pang, S. Mishra-Sharma and D. Shih, Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction, arXiv:2605.13950
-
[20]
Desai,RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis, arXiv:2605.17318
A. Desai,RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis, arXiv:2605.17318
-
[21]
I. R. Wang,LeWRON: Agentic Analysis of Electroweak Phase Transitions, arXiv:2606.19425
-
[22]
M. Cirelli, N. Fornengo and A. Strumia,Minimal dark matter,Nucl. Phys. B753(2006) 178 [arXiv:hep-ph/0512090]
Pith/arXiv arXiv 2006
-
[23]
Fritzsch and P
H. Fritzsch and P. Minkowski,Unified Interactions of Leptons and Hadrons,Annals Phys.93 (1975) 193
1975
- [24]
-
[25]
N. D. Christensen and C. Duhr,FeynRules - Feynman rules made easy,Comput. Phys. Commun.180(2009) 1614 [arXiv:0806.4194]
Pith/arXiv arXiv 2009
-
[26]
R. M. Fonseca,Calculating the renormalisation group equations of a SUSY model with Susyno,Comput. Phys. Commun.183(2012) 2298 [arXiv:1106.5016]
Pith/arXiv arXiv 2012
-
[27]
Minkowski,µ→eγat a Rate of One Out of109 Muon Decays?,Phys
P. Minkowski,µ→eγat a Rate of One Out of109 Muon Decays?,Phys. Lett. B67(1977) 421
1977
-
[28]
Georgi and M
H. Georgi and M. Machacek,DOUBLY CHARGED HIGGS BOSONS,Nucl. Phys. B262 (1985) 463
1985
- [29]
-
[30]
K. S. Babu, S. Nandi and Z. Tavartkiladze,New Mechanism for Neutrino Mass Generation and Triply Charged Higgs Bosons at the LHC,Phys. Rev.D80(2009) 071702 [arXiv:0905.2710]
Pith/arXiv arXiv 2009
-
[31]
Buchmuller, R
W. Buchmuller, R. Ruckl and D. Wyler,Leptoquarks in Lepton - Quark Collisions,Phys. Lett.B191(1987) 442
1987
-
[32]
I. Doršner, S. Fajfer, A. Greljo, J. F. Kamenik and N. Košnik,Physics of leptoquarks in precision experiments and at particle colliders,Phys. Rept.641(2016) 1 [arXiv:1603.04993]
Pith/arXiv arXiv 2016
-
[33]
A. Crivellin and L. Schnell,Complete Lagrangian and set of Feynman rules for scalar leptoquarks,Comput. Phys. Commun.271(2022) 108188 [arXiv:2105.04844]
arXiv 2022
-
[34]
J. Hisano and K. Tsumura,Higgs boson mixes with an SU(2) septet representation,Phys. Rev. D87(2013) 053004 [arXiv:1301.6455]
Pith/arXiv arXiv 2013
-
[35]
C. Alvarado, L. Lehman and B. Ostdiek,Surveying the Scope of theSU(2)L Scalar Septet Sector,JHEP05(2014) 150 [arXiv:1404.3208]
Pith/arXiv arXiv 2014
-
[36]
E. Ma, M. Raidal and U. Sarkar,Probing the exotic particle content beyond the standard model,Eur. Phys. J. C8(1999) 301 [arXiv:hep-ph/9808484]
Pith/arXiv arXiv 1999
-
[37]
A. V. Manohar and M. B. Wise,Flavor changing neutral currents, an extended scalar sector, and the Higgs production rate at the CERN LHC,Phys. Rev. D74(2006) 035009 [arXiv:hep-ph/0606172]
Pith/arXiv arXiv 2006
-
[38]
Meurer, C
A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin et al.,Sympy: symbolic computing in python,PeerJ Computer Science3(2017) e103
2017
-
[39]
LangChain
H. Chase, “LangChain.”https://github.com/langchain-ai/langchain, Oct., 2022
2022
-
[40]
Ollama Contributors, “Ollama.”https://github.com/ollama/ollama, 2023. 49
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.