pith. machine review for the scientific record. sign in

arxiv: 2605.10480 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: no theorem link

ASIA: an Autonomous System Identification Agent

Dario Piga, Marco Forgione

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:57 UTC · model grok-4.3

classification 💻 cs.AI
keywords system identificationlarge language modelsautonomous agentsdynamical modelsmodel selectionhyperparameter tuning
0
0 comments X

The pith

A large language model can serve as an autonomous agent to discover dynamical models by iterating hypotheses, code, and evaluations from only a plain-English problem description.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ASIA as a way to automate the trial-and-error process in system identification, where experts normally choose model types and tune parameters by hand. By using a large language model to act as a coding agent, the framework takes a natural language description of the task and handles the full cycle of proposing models, implementing them, evaluating performance, and refining without further human input. The authors run experiments on two standard benchmarks to see what architectures and strategies the agent finds and how good the final models are. They also highlight practical issues such as the risk of the model seeing test data indirectly and the loss of clear methodological steps.

Core claim

ASIA closes the loop between hypothesis, implementation, and evaluation in system identification without human intervention, requiring only a plain-English description of the identification problem.

What carries the argument

The ASIA framework that lets a large language model function as an autonomous coding agent for model discovery and training in dynamical systems.

If this is right

  • The agent explores different model architectures and training strategies on its own.
  • Resulting models are evaluated for quality on system identification benchmarks.
  • The approach reduces the expert time needed for empirical tuning.
  • Limitations include possible test data leakage and challenges to reproducibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could be applied to automate modeling in related fields such as control systems or time-series forecasting.
  • Future versions might incorporate safeguards to ensure the agent does not access hidden test data during its search.
  • Transparency tools could be added to log the agent's reasoning steps for better scientific validation.

Load-bearing premise

That the large language model can carry out effective iterative model search and discovery on system identification tasks without causing test leakage or reducing methodological transparency.

What would settle it

Testing the agent on a fresh benchmark dataset where no training examples overlap with its possible knowledge and verifying whether it still produces models whose accuracy matches or exceeds that of manually tuned ones.

Figures

Figures reproduced from arXiv: 2605.10480 by Dario Piga, Marco Forgione.

Figure 1
Figure 1. Figure 1: Cascaded tanks: Normalized validation RMSE vs iterations of ASIA. RMSE obtained at each iteration (point) and best value achieved up to that iteration (line). Annotated tags mark the main modifications introduced where a new best is reached. where k1, k2, k3, k4 are unknown physical parameters related to the outlet cross-sections and the pump gain. In addition, a hard state saturation nonlinearity not capt… view at source ↗
Figure 3
Figure 3. Figure 3: Nanodrone: comparison between random search (blue) and agentic AI (orange) in terms of validation MAE vs iterations. MAE at each iteration (point) and best value achieved up to that point (solid lines). candidate configurations in total (baseline plus 30 iterations), producing eight successive improvements. As a baseline, a random search over the same budget of 30 iterations was conducted. Each configurati… view at source ↗
read the original abstract

Over the years, research in system identification has provided a rich set of methods for learning dynamical models, together with well-established theoretical guarantees. In practice, however, the choice of model class, training algorithm, and hyperparameter tuning is still largely left to empirical trial-and-error, requiring substantial expert time and domain experience. Motivated by recent advances in agentic artificial intelligence, we present ASIA, a framework that delegates this iterative search to a large language model acting as an autonomous coding agent. Building on existing agentic platforms, ASIA closes the loop between hypothesis, implementation, and evaluation without human intervention, requiring only a plain-English description of the identification problem. We conduct an empirical study of ASIA on two system identification benchmarks and analyse the agent's search behaviour, the architectures and training strategies it discovers, and the quality of the resulting models. We also discuss the potential of the approach and its current limitations, including implicit test leakage, reduced methodological transparency, and reproducibility concerns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ASIA, a framework that employs a large language model as an autonomous coding agent to perform system identification. Given only a plain-English description of the identification problem, the agent iteratively generates hypotheses, implements models in code, evaluates them, and refines the search without human intervention. The paper reports an empirical study on two system identification benchmarks, analyzing the agent's search behavior, the architectures and training strategies discovered, and the quality of the resulting models, while discussing limitations including implicit test leakage, reduced methodological transparency, and reproducibility concerns.

Significance. If the empirical results hold under proper controls, ASIA could automate a labor-intensive aspect of dynamical modeling that currently relies on expert trial-and-error, potentially accelerating research in control and system identification by leveraging recent agentic AI capabilities. The work provides a concrete demonstration of closing the hypothesis-implementation-evaluation loop from natural language input alone.

major comments (2)
  1. [Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.
  2. [Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.
minor comments (1)
  1. [Abstract] The two benchmarks used in the empirical study are not named in the abstract or early sections; naming them (and providing brief descriptions) would improve clarity for readers unfamiliar with the specific identification tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review of our manuscript introducing ASIA. We appreciate the referee's focus on strengthening the empirical claims and limitations discussion. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.

    Authors: We thank the referee for identifying this important gap. The current manuscript provides a qualitative description of the agent's search behavior, the model architectures and training strategies discovered, and a high-level assessment of model quality on the two benchmarks, but it does not include explicit quantitative performance metrics (such as normalized mean squared error or fit percentages), statistical summaries, or comparisons against standard system identification baselines. This does limit the ability to judge competitiveness. We will revise the empirical study section (and update the abstract accordingly) to report concrete performance statistics for the discovered models on both benchmarks, include baseline comparisons where feasible, and add tables or figures summarizing model quality. These additions will directly support the autonomy and effectiveness claims. revision: yes

  2. Referee: [Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.

    Authors: We agree that the limitations section is currently too brief on this issue. While the manuscript flags implicit test leakage as a concern, it does not describe any implemented controls, auditing procedures, or attempt to bound the risk. We will substantially expand the limitations discussion to detail the experimental setup (including environment isolation and prompt construction practices), any post-experiment verification steps performed, and an honest assessment of remaining leakage pathways and their potential impact on the 'plain-English input only' claim. Where quantitative bounds are not available, we will explain the reasons and outline mitigation strategies for future work. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical framework description

full rationale

The paper describes an agentic LLM framework for system identification and reports empirical results on two benchmarks. No mathematical derivations, equations, fitted parameters, or first-principles claims appear in the provided text or abstract. The central assertions rest on external benchmark evaluations rather than any self-referential reduction, self-citation chain, or ansatz smuggled via prior work. No load-bearing step reduces a prediction to its own inputs by construction, satisfying the default expectation of no significant circularity for non-derivational empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the unproven capability of current LLMs to act as reliable autonomous agents for this task, with no explicit free parameters or mathematical axioms stated.

invented entities (1)
  • ASIA autonomous coding agent no independent evidence
    purpose: To perform iterative hypothesis, code implementation, and evaluation for system identification without human input
    The agent is the core proposed entity, but the abstract provides no independent falsifiable evidence beyond the described empirical study.

pith-pipeline@v0.9.0 · 5457 in / 1187 out tokens · 43901 ms · 2026-05-12T04:57:08.393787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

  1. [1]

    From System Models to Class Models: An In-Context Learning Paradigm , year=

    Forgione, Marco and Pura, Filippo and Piga, Dario , journal=. From System Models to Class Models: An In-Context Learning Paradigm , year=

  2. [2]

    Can Transformers Learn Optimal Filtering for Unknown Systems? , year=

    Du, Zhe and Balim, Haldun and Oymak, Samet and Ozay, Necmiye , journal=. Can Transformers Learn Optimal Filtering for Unknown Systems? , year=

  3. [3]

    2026 , author =

    Nonlinear system identification for a nano-drone benchmark , journal =. 2026 , author =

  4. [4]

    Ljung, Lennart , title =

  5. [5]

    2017 , note =

    Three Benchmarks Addressing Open Challenges in Nonlinear System Identification , journal =. 2017 , note =

  6. [6]

    Journal of Machine Learning Research , volume =

    Bergstra, James and Bengio, Yoshua , title =. Journal of Machine Learning Research , volume =

  7. [7]

    , title =

    Snoek, Jasper and Larochelle, Hugo and Adams, Ryan P. , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

  8. [8]

    and de Freitas, Nando , title =

    Shahriari, Bobak and Swersky, Kevin and Wang, Ziyu and Adams, Ryan P. and de Freitas, Nando , title =. Proceedings of the IEEE , volume =

  9. [9]

    Rasmussen, Carl Edward and Williams, Christopher K. I. , title =

  10. [10]

    2026 , howpublished =

    Karpathy, Andrej , title =. 2026 , howpublished =

  11. [11]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , title =. arXiv preprint arXiv:2408.06292 , year =

  12. [12]

    2025 , howpublished =

  13. [13]

    2024 , howpublished =

    Cascaded Tanks Benchmark -- Leaderboard , author =. 2024 , howpublished =

  14. [14]

    Nature Machine Intelligence , volume =

    Hasani, Ramin and Lechner, Mathias and Amini, Alexander and Liebenwein, Lucas and Ray, Aaron and Tschaikowski, Max and Teschl, Gerald and Rus, Daniela , title =. Nature Machine Intelligence , volume =

  15. [15]

    Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure , booktitle =

  16. [16]

    Discovering model structure of dynamical systems with combinatorial

    Rath, Lucas and von Rohr, Alexander and Schultze, Andreas and Trimpe, Sebastian and Corves, Burkhard , journal=. Discovering model structure of dynamical systems with combinatorial

  17. [17]

    2021 , author =

    IFAC-PapersOnLine , volume =. 2021 , author =

  18. [18]

    Able , title=

    B.C. Able , title=. Birches. J. , year=

  19. [19]

    Able , title=

    B.C. Able , title=. Nature , year=

  20. [20]

    Able and R.A

    B.C. Able and R.A. Tagg and M. Rush , title=. Advances in Enzymology , address=. 1954 , volume=

  21. [21]

    Baker , title=

    R.C. Baker , title=. 1963 , address=

  22. [22]

    Baker , title=

    R.C. Baker , title=. J. Brit. Med. Assoc. , year=

  23. [23]

    Dictionary of the American Language

    The American Heritage. Dictionary of the American Language

  24. [24]

    Charlie and M.B

    F.H. Charlie and M.B. Routh , title=. J. Am. Chem. Soc. , year=

  25. [25]

    Dog , title=

    P.R. Dog , title=. Chemical Carcinogenesis , publisher=. 1958 , editor=

  26. [26]

    Keohane , title=

    R. Keohane , title=. 1958 , address=

  27. [27]

    Powers , title=

    T. Powers , title=. Harpers , year=