arxiv: 2605.10480 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: no theorem link

ASIA: an Autonomous System Identification Agent

Dario Piga, Marco Forgione

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:57 UTC · model grok-4.3

classification 💻 cs.AI

keywords system identificationlarge language modelsautonomous agentsdynamical modelsmodel selectionhyperparameter tuning

0 comments

The pith

A large language model can serve as an autonomous agent to discover dynamical models by iterating hypotheses, code, and evaluations from only a plain-English problem description.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ASIA as a way to automate the trial-and-error process in system identification, where experts normally choose model types and tune parameters by hand. By using a large language model to act as a coding agent, the framework takes a natural language description of the task and handles the full cycle of proposing models, implementing them, evaluating performance, and refining without further human input. The authors run experiments on two standard benchmarks to see what architectures and strategies the agent finds and how good the final models are. They also highlight practical issues such as the risk of the model seeing test data indirectly and the loss of clear methodological steps.

Core claim

ASIA closes the loop between hypothesis, implementation, and evaluation in system identification without human intervention, requiring only a plain-English description of the identification problem.

What carries the argument

The ASIA framework that lets a large language model function as an autonomous coding agent for model discovery and training in dynamical systems.

If this is right

The agent explores different model architectures and training strategies on its own.
Resulting models are evaluated for quality on system identification benchmarks.
The approach reduces the expert time needed for empirical tuning.
Limitations include possible test data leakage and challenges to reproducibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be applied to automate modeling in related fields such as control systems or time-series forecasting.
Future versions might incorporate safeguards to ensure the agent does not access hidden test data during its search.
Transparency tools could be added to log the agent's reasoning steps for better scientific validation.

Load-bearing premise

That the large language model can carry out effective iterative model search and discovery on system identification tasks without causing test leakage or reducing methodological transparency.

What would settle it

Testing the agent on a fresh benchmark dataset where no training examples overlap with its possible knowledge and verifying whether it still produces models whose accuracy matches or exceeds that of manually tuned ones.

Figures

Figures reproduced from arXiv: 2605.10480 by Dario Piga, Marco Forgione.

**Figure 1.** Figure 1: Cascaded tanks: Normalized validation RMSE vs iterations of ASIA. RMSE obtained at each iteration (point) and best value achieved up to that iteration (line). Annotated tags mark the main modifications introduced where a new best is reached. where k1, k2, k3, k4 are unknown physical parameters related to the outlet cross-sections and the pump gain. In addition, a hard state saturation nonlinearity not capt… view at source ↗

**Figure 3.** Figure 3: Nanodrone: comparison between random search (blue) and agentic AI (orange) in terms of validation MAE vs iterations. MAE at each iteration (point) and best value achieved up to that point (solid lines). candidate configurations in total (baseline plus 30 iterations), producing eight successive improvements. As a baseline, a random search over the same budget of 30 iterations was conducted. Each configurati… view at source ↗

read the original abstract

Over the years, research in system identification has provided a rich set of methods for learning dynamical models, together with well-established theoretical guarantees. In practice, however, the choice of model class, training algorithm, and hyperparameter tuning is still largely left to empirical trial-and-error, requiring substantial expert time and domain experience. Motivated by recent advances in agentic artificial intelligence, we present ASIA, a framework that delegates this iterative search to a large language model acting as an autonomous coding agent. Building on existing agentic platforms, ASIA closes the loop between hypothesis, implementation, and evaluation without human intervention, requiring only a plain-English description of the identification problem. We conduct an empirical study of ASIA on two system identification benchmarks and analyse the agent's search behaviour, the architectures and training strategies it discovers, and the quality of the resulting models. We also discuss the potential of the approach and its current limitations, including implicit test leakage, reduced methodological transparency, and reproducibility concerns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ASIA, a framework that employs a large language model as an autonomous coding agent to perform system identification. Given only a plain-English description of the identification problem, the agent iteratively generates hypotheses, implements models in code, evaluates them, and refines the search without human intervention. The paper reports an empirical study on two system identification benchmarks, analyzing the agent's search behavior, the architectures and training strategies discovered, and the quality of the resulting models, while discussing limitations including implicit test leakage, reduced methodological transparency, and reproducibility concerns.

Significance. If the empirical results hold under proper controls, ASIA could automate a labor-intensive aspect of dynamical modeling that currently relies on expert trial-and-error, potentially accelerating research in control and system identification by leveraging recent agentic AI capabilities. The work provides a concrete demonstration of closing the hypothesis-implementation-evaluation loop from natural language input alone.

major comments (2)

[Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.
[Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.

minor comments (1)

[Abstract] The two benchmarks used in the empirical study are not named in the abstract or early sections; naming them (and providing brief descriptions) would improve clarity for readers unfamiliar with the specific identification tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review of our manuscript introducing ASIA. We appreciate the referee's focus on strengthening the empirical claims and limitations discussion. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Empirical study / Abstract] Empirical study section (and abstract): the central claim that ASIA autonomously closes the loop 'without human intervention' rests on an empirical evaluation whose supporting evidence is not quantified. No performance metrics, baseline comparisons, or model-quality statistics are reported for the two benchmarks, leaving it impossible to assess whether discovered models are competitive or merely functional.

Authors: We thank the referee for identifying this important gap. The current manuscript provides a qualitative description of the agent's search behavior, the model architectures and training strategies discovered, and a high-level assessment of model quality on the two benchmarks, but it does not include explicit quantitative performance metrics (such as normalized mean squared error or fit percentages), statistical summaries, or comparisons against standard system identification baselines. This does limit the ability to judge competitiveness. We will revise the empirical study section (and update the abstract accordingly) to report concrete performance statistics for the discovered models on both benchmarks, include baseline comparisons where feasible, and add tables or figures summarizing model quality. These additions will directly support the autonomy and effectiveness claims. revision: yes
Referee: [Limitations / Empirical study] Limitations discussion and empirical evaluation: implicit test leakage is explicitly listed as a current limitation, yet the manuscript provides no description of controls (agent isolation from test splits, prompt auditing, data-access logging, or post-hoc verification) nor any quantitative bound on leakage risk. This directly weakens the 'plain-English input only' and autonomy assertions, as inadvertent exposure to evaluation data cannot be ruled out.

Authors: We agree that the limitations section is currently too brief on this issue. While the manuscript flags implicit test leakage as a concern, it does not describe any implemented controls, auditing procedures, or attempt to bound the risk. We will substantially expand the limitations discussion to detail the experimental setup (including environment isolation and prompt construction practices), any post-experiment verification steps performed, and an honest assessment of remaining leakage pathways and their potential impact on the 'plain-English input only' claim. Where quantitative bounds are not available, we will explain the reasons and outline mitigation strategies for future work. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical framework description

full rationale

The paper describes an agentic LLM framework for system identification and reports empirical results on two benchmarks. No mathematical derivations, equations, fitted parameters, or first-principles claims appear in the provided text or abstract. The central assertions rest on external benchmark evaluations rather than any self-referential reduction, self-citation chain, or ansatz smuggled via prior work. No load-bearing step reduces a prediction to its own inputs by construction, satisfying the default expectation of no significant circularity for non-derivational empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the unproven capability of current LLMs to act as reliable autonomous agents for this task, with no explicit free parameters or mathematical axioms stated.

invented entities (1)

ASIA autonomous coding agent no independent evidence
purpose: To perform iterative hypothesis, code implementation, and evaluation for system identification without human input
The agent is the core proposed entity, but the abstract provides no independent falsifiable evidence beyond the described empirical study.

pith-pipeline@v0.9.0 · 5457 in / 1187 out tokens · 43901 ms · 2026-05-12T04:57:08.393787+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

[1]

From System Models to Class Models: An In-Context Learning Paradigm , year=

Forgione, Marco and Pura, Filippo and Piga, Dario , journal=. From System Models to Class Models: An In-Context Learning Paradigm , year=

work page
[2]

Can Transformers Learn Optimal Filtering for Unknown Systems? , year=

Du, Zhe and Balim, Haldun and Oymak, Samet and Ozay, Necmiye , journal=. Can Transformers Learn Optimal Filtering for Unknown Systems? , year=

work page
[3]

2026 , author =

Nonlinear system identification for a nano-drone benchmark , journal =. 2026 , author =

work page 2026
[4]

Ljung, Lennart , title =

work page
[5]

2017 , note =

Three Benchmarks Addressing Open Challenges in Nonlinear System Identification , journal =. 2017 , note =

work page 2017
[6]

Journal of Machine Learning Research , volume =

Bergstra, James and Bengio, Yoshua , title =. Journal of Machine Learning Research , volume =

work page
[7]

, title =

Snoek, Jasper and Larochelle, Hugo and Adams, Ryan P. , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[8]

and de Freitas, Nando , title =

Shahriari, Bobak and Swersky, Kevin and Wang, Ziyu and Adams, Ryan P. and de Freitas, Nando , title =. Proceedings of the IEEE , volume =

work page
[9]

Rasmussen, Carl Edward and Williams, Christopher K. I. , title =

work page
[10]

2026 , howpublished =

Karpathy, Andrej , title =. 2026 , howpublished =

work page 2026
[11]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , title =. arXiv preprint arXiv:2408.06292 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[12]

2025 , howpublished =

work page 2025
[13]

2024 , howpublished =

Cascaded Tanks Benchmark -- Leaderboard , author =. 2024 , howpublished =

work page 2024
[14]

Nature Machine Intelligence , volume =

Hasani, Ramin and Lechner, Mathias and Amini, Alexander and Liebenwein, Lucas and Ray, Aaron and Tschaikowski, Max and Teschl, Gerald and Rus, Daniela , title =. Nature Machine Intelligence , volume =

work page
[15]

Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure , booktitle =

work page
[16]

Discovering model structure of dynamical systems with combinatorial

Rath, Lucas and von Rohr, Alexander and Schultze, Andreas and Trimpe, Sebastian and Corves, Burkhard , journal=. Discovering model structure of dynamical systems with combinatorial

work page
[17]

2021 , author =

IFAC-PapersOnLine , volume =. 2021 , author =

work page 2021
[18]

Able , title=

B.C. Able , title=. Birches. J. , year=

work page
[19]

Able , title=

B.C. Able , title=. Nature , year=

work page
[20]

Able and R.A

B.C. Able and R.A. Tagg and M. Rush , title=. Advances in Enzymology , address=. 1954 , volume=

work page 1954
[21]

Baker , title=

R.C. Baker , title=. 1963 , address=

work page 1963
[22]

Baker , title=

R.C. Baker , title=. J. Brit. Med. Assoc. , year=

work page
[23]

Dictionary of the American Language

The American Heritage. Dictionary of the American Language

work page
[24]

Charlie and M.B

F.H. Charlie and M.B. Routh , title=. J. Am. Chem. Soc. , year=

work page
[25]

Dog , title=

P.R. Dog , title=. Chemical Carcinogenesis , publisher=. 1958 , editor=

work page 1958
[26]

Keohane , title=

R. Keohane , title=. 1958 , address=

work page 1958
[27]

Powers , title=

T. Powers , title=. Harpers , year=

work page