From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

Benoit Gaudou; Matthieu Mastio; Nicolas Verstaevel; Paul Saves

arxiv: 2604.03350 · v1 · submitted 2026-04-03 · 💻 cs.LG · cs.AI

From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

Paul Saves , Matthieu Mastio , Nicolas Verstaevel , Benoit Gaudou This is my paper

Pith reviewed 2026-05-13 19:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords agent-based modelssensitivity analysismachine learning surrogatesstochastic simulatorsparameter screeningdesign of experimentsmodel exploration

0 comments

The pith

A multi-stage pipeline of automated screening followed by machine learning surrogates allows systematic exploration of high-dimensional stochastic agent-based models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish a hands-off framework for sensitivity analysis and policy testing in high-dimensional stochastic simulators using agent-based models. It proceeds in two steps: first an automated model-based screening identifies dominant variables, assesses outcome variability, and segments the parameter space; second, machine learning models are trained to map the remaining nonlinear interaction effects. This approach automates the discovery of unstable regions where system outcomes depend on many interacting variables. A sympathetic reader would care because traditional exploration is infeasible due to the curse of dimensionality and stochasticity, making automated methods necessary for practical model use.

Core claim

Using a predator-prey case study, the methodology shows that integrating the systematic design of experiments with machine learning surrogates in a multi-stage pipeline identifies dominant variables and segments the parameter space in the first step, then trains models to capture nonlinear interactions in the second step, thereby discovering unstable regions automatically.

What carries the argument

The two-stage pipeline where automated model-based screening identifies dominant variables and segments the parameter space before machine learning surrogates model nonlinear effects.

If this is right

The framework provides modelers with a rigorous hands-off method for sensitivity analysis.
Unstable regions in stochastic ABMs can be discovered without manual parameter tuning.
Policy testing becomes feasible even for high-dimensional simulators.
Outcome variability is assessed systematically during the screening phase.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be tested on ABMs from fields like epidemiology or economics to see if similar unstable regions are found.
The segmented spaces might allow for more targeted data collection in future simulations.
Hybrid approaches like this may generalize to other types of stochastic models beyond agents.

Load-bearing premise

The automated model-based screening step can reliably identify dominant variables and segment the parameter space despite the inherent stochasticity of the agent-based simulator.

What would settle it

Running multiple instances of the screening on the predator-prey model with varied random seeds and checking if the identified dominant variables and unstable regions remain consistent across runs.

Figures

Figures reproduced from arXiv: 2604.03350 by Benoit Gaudou, Matthieu Mastio, Nicolas Verstaevel, Paul Saves.

**Figure 1.** Figure 1: Multi-stage exploration workflow. The methodology transitions from a coarse model-based screening to a finegrained data-driven analysis. Initially, we assess internal stochasticity using linear and simple global screening to identify dominant drivers and interpretable extraction rules methods to segment stability regions locally. We then deepen this analysis by training, if needed, one or many machine le… view at source ↗

**Figure 2.** Figure 2: PDP/ICE and Uncertainties variations for the 6 most important features. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Simulation results showing the relationship between variables X and Y. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Global Regime Dynamics. Comparison of simulation outcome distributions between the initial broad exploration (V1) and the refined sampling strategy (V2). The dominance of the Extinction regime highlights the system’s structural vulnerability [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Model-Based Screening (Phase 1). A Decision Tree (CART) trained on the initial dataset. It identifies the primary anthropogenic tipping point: a deterministic shift toward extinction when the Proportion of Hunting Zones (P H) exceeds ≈ 31%. (a) Sobol’ Indices (S1 vs ST ) (b) Interaction Heatmap [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Global Sensitivity Analysis (Phase 2). (a) Comparison of FirstOrder and Total-Order indices reveals that variance is driven by interactions rather than single parameters. (b) The heatmap highlights the strong coupling between Bandicoot Energy Gain (BG) and Grassland Availability (Gr) [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Nonlinear Response and Mechanisms. Partial Dependence Plots (black lines) overlaid with Individual Conditional Expectation curves (colored lines). The coloring by BG reveals a “Metabolic Trap”: increasing resources (Gr) only promotes coexistence if metabolic efficiency (BG) is sufficiently high (red lines) [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Phase Transition Zones. A map of the Total Euclidean Uncertainty (σtotal = q σ 2 aleatoric + σ 2 epistemic) regarding the probability of coexistence. Peaks in uncertainty identify the precise location of tipping points where the ecosystem is structurally unstable [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Systematic exploration of Agent-Based Models (ABMs) is challenged by the curse of dimensionality and their inherent stochasticity. We present a multi-stage pipeline integrating the systematic design of experiments with machine learning surrogates. Using a predator-prey case study, our methodology proceeds in two steps. First, an automated model-based screening identifies dominant variables, assesses outcome variability, and segments the parameter space. Second, we train Machine Learning models to map the remaining nonlinear interaction effects. This approach automates the discovery of unstable regions where system outcomes are highly dependent on nonlinear interactions between many variables. Thus, this work provides modelers with a rigorous, hands-off framework for sensitivity analysis and policy testing, even when dealing with high-dimensional stochastic simulators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical two-stage screening-plus-surrogate workflow for stochastic ABMs that holds up in the predator-prey demonstration without major gaps.

read the letter

This paper lays out a two-stage workflow for exploring stochastic agent-based models: first an automated screening step to find dominant variables and segment the parameter space, then machine learning surrogates to capture the remaining nonlinear effects. The predator-prey case study shows the method working with multiple replicates to account for stochasticity and variance-aware segmentation to identify unstable regions. What is new is the integrated pipeline that combines model-based screening with surrogates specifically for high-dimensional stochastic simulators. It automates discovery of areas where outcomes depend heavily on interactions between many variables, which addresses a real pain point in sensitivity analysis and policy exploration for these models. The paper does well by keeping the approach practical and hands-off. The screening reduces the problem size before surrogate training, and the case study includes explicit handling of variability through replicates. This makes the central claim of a rigorous framework for stochastic ABMs hold up based on the reported results. Soft spots are limited. Validation is confined to one case study, so it would benefit from additional examples or comparisons to existing DOE and surrogate techniques. However, there are no load-bearing flaws, circular arguments, or unaddressed assumptions that undermine the workflow as described. This work is aimed at modelers in ecology, social sciences, or similar areas who run stochastic agent-based simulations and need tools for systematic exploration without excessive manual intervention. Readers focused on applied computational methods will get the most from the concrete steps and the demonstration. The paper shows clear thinking on the challenges of dimensionality and stochasticity, with honest engagement through the case study. It deserves a serious referee because the method is well-specified and the evidence from the predator-prey example supports the claims without gaps. I recommend sending it to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript presents a multi-stage workflow for systematic exploration of stochastic agent-based models (ABMs). It begins with an automated model-based screening step that identifies dominant variables, assesses outcome variability, and segments the parameter space, followed by training machine learning surrogates to capture remaining nonlinear interaction effects. The approach is demonstrated on a predator-prey case study that incorporates multiple replicates to handle stochasticity and uses variance-aware segmentation. The central claim is that this pipeline supplies modelers with a rigorous, hands-off framework for sensitivity analysis and policy testing in high-dimensional stochastic simulators.

Significance. If the reported results hold, the workflow addresses a practical challenge in ABM analysis by automating sensitivity screening and surrogate construction in the presence of stochasticity and high dimensionality. The explicit use of replicates and variance-aware segmentation in the predator-prey case study is a constructive element that could improve reliability over purely deterministic screening methods. The combination of model-based and data-driven stages offers a potentially useful template for modelers working with complex simulators where exhaustive enumeration is infeasible.

minor comments (3)

[Abstract] Abstract: the summary states that the workflow 'automates the discovery of unstable regions' but supplies no quantitative metrics (e.g., surrogate prediction error, fraction of parameter space identified as unstable, or comparison against exhaustive sampling) from the predator-prey case study; adding one or two such numbers would strengthen the abstract.
[Methodology] Methodology section: the description of the ML surrogate training step should specify the exact algorithms (e.g., random forest, Gaussian process, neural network), hyperparameter selection procedure, and any cross-validation or uncertainty quantification used to ensure the surrogates faithfully reproduce the segmented nonlinear effects.
[Case Study] Case-study results: figures showing segmented parameter regions would benefit from explicit statistical summaries (mean, variance, and replicate count per segment) and a direct comparison of screening runtime versus full-factorial enumeration to quantify the claimed efficiency gain.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the workflow's practical value in handling stochasticity and high dimensionality, and recommendation for minor revision. The explicit use of replicates and variance-aware segmentation is indeed central to reliability, and we are glad this was noted as a constructive element.

Circularity Check

0 steps flagged

No significant circularity in the multi-stage workflow

full rationale

The paper describes a forward methodological pipeline: automated model-based screening to identify dominant variables and segment parameter space, followed by training ML surrogates on the reduced space. No equations, derivations, or fitted quantities are presented that reduce by construction to inputs defined within the same pipeline. The predator-prey case study is used to illustrate the workflow with explicit handling of stochasticity via replicates, providing independent empirical support rather than self-referential definitions or load-bearing self-citations. The central claim of a hands-off framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unproven premise that model-based screening can isolate dominant variables in stochastic simulators and that standard ML models can then faithfully represent the remaining nonlinear interactions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Automated model-based screening can reliably identify dominant variables and segment the parameter space in the presence of stochasticity.
Invoked as the first step of the pipeline without supporting evidence in the abstract.
domain assumption Machine-learning surrogates can accurately capture nonlinear interaction effects once the parameter space has been segmented.
Assumed in the second stage without reported validation metrics.

pith-pipeline@v0.9.0 · 5432 in / 1238 out tokens · 46607 ms · 2026-05-13T19:44:37.361885+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

PLOS One17(2022)

Angione, C., Silverman, E., Yaneske, E.: Using machine learning as a surrogate model for agent-based simulations. PLOS One17(2022)

work page 2022
[2]

In: Agent-based Spatial Simulation with NetLogo, pp

Banos, A., Caillou, P., Gaudou, B., Marilleau, N.: Agent-based model exploration. In: Agent-based Spatial Simulation with NetLogo, pp. 125–181. Elsevier (2015)

work page 2015
[3]

SAR and QSAR in Environmental Re- search17(3), 337–352 (2006)

Chen, J.J., Tsai, C.A., Moon, H., Ahn, H., Young, J.J., Chen, C.H.: Decision threshold adjustment in class prediction. SAR and QSAR in Environmental Re- search17(3), 337–352 (2006)

work page 2006
[4]

In: MABS (2023)

De Bosscher, B., Ziabari, S.S.M., Sharpanskykh, A.: Towards a better understand- ing of agent-based airport terminal operations using surrogate modeling. In: MABS (2023)

work page 2023
[5]

journal of Computational and Graphical Statistics24(1), 44–65 (2015)

Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics24(1), 44–65 (2015)

work page 2015
[6]

Goodfellow, I.: Deep learning (2016)

work page 2016
[7]

In: Individual- based modeling and ecology

Grimm, V., Railsback, S.F.: Individual-based modeling and ecology. In: Individual- based modeling and ecology. Princeton university press (2013)

work page 2013
[8]

Machine learning110(3), 457– 506 (2021)

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning110(3), 457– 506 (2021)

work page 2021
[9]

Uncer- tainty management in simulation-optimization of complex systems: algorithms and applications pp

Iooss, B., Lemaître, P.: A review on global sensitivity analysis methods. Uncer- tainty management in simulation-optimization of complex systems: algorithms and applications pp. 101–122 (2015)

work page 2015
[10]

Lamperti, F., Mandel, A., Napoletano, M., Sapio, A., Roventini, A., Balint, T., Khorenzhenko, I.: Towards agent-based integrated assessment models: examples, challenges, and future developments. Reg. Env. Change19(3), 747–762 (2019)

work page 2019
[11]

In: Pro- ceedings of AAMAS 2026

Mastio, M., Saves, P., Gaudou, B., Verstaevel, N.: Adaptive agents in spatial double-auction markets: Modeling the emergence of industrial symbiosis. In: Pro- ceedings of AAMAS 2026. vol. 2026, pp. 1–10. IFAAMAS (2026)

work page 2026
[12]

John wiley & sons (2017)

Montgomery, D.C.: Design and analysis of experiments. John wiley & sons (2017)

work page 2017
[13]

Novak, M., Wilensky, U.: NetLogo Bug Hunt Predators and Invasive Species (2011)

work page 2011
[14]

Princeton university press (2019)

Railsback, S.F., Grimm, V.: Agent-based and individual-based modeling: a practi- cal introduction. Princeton university press (2019)

work page 2019
[15]

Structural and Multidisciplinary Optimization (2026) Automated Exploration of Parameters for Agent-Based Simulation 13

Saves, P., Hallé-Hannan, E., Bussemaker, J., Diouane, Y., Bartoli, N.: Modeling hi- erarchical spaces: A review and unified framework for surrogate-based architecture design. Structural and Multidisciplinary Optimization (2026) Automated Exploration of Parameters for Agent-Based Simulation 13

work page 2026
[16]

arXiv preprint arXiv:2510.16742;2025 (2025)

Saves, P., Palar, P.S., Robani, M.D., Verstaevel, N., Garouani, M., Aligon, J., Gau- dou, B., Shimoyama, K., Morlier, J.: Surrogate modeling and explainable artificial intelligence for complex systems: A workflow for automated simulation exploration. arXiv preprint arXiv:2510.16742;2025 (2025)

work page arXiv 2025
[17]

Journal of Machine Learn- ing Research9(3) (2008)

Shafer, G., Vovk, V.: A tutorial on conformal prediction. Journal of Machine Learn- ing Research9(3) (2008)

work page 2008
[18]

Journal of Artificial Societies and Social Simulation17(3), 11 (2014)

Thiele, J.C., Kurth, W., Grimm, V.: Facilitating parameter estimation and sensi- tivity analysis of agent-based models: A cookbook using NetLogo and R. Journal of Artificial Societies and Social Simulation17(3), 11 (2014)

work page 2014
[19]

In: Proc

Tumer, K., Ghosh, J.: Estimating the bayes error rate through classifier combining. In: Proc. of ICPR ’96. vol. 2, pp. 695–699. IEEE (1996)

work page 1996
[20]

Metabolic Trap

Wilensky, U.: Netlogo: Center for connected learning and computer-based model- ing. Northwestern University, Evanston, IL4952(1999) 14 P. Saves, M. Mastio, N. Verstaevel, B. Gaudou Supplementary Materials: Generated Figures Fig.3: Simulation results showing the relationship between variables X and Y. (a) Initial Exploratory Batch (N= 3,250) (b) Refined Fo...

work page 1999

[1] [1]

PLOS One17(2022)

Angione, C., Silverman, E., Yaneske, E.: Using machine learning as a surrogate model for agent-based simulations. PLOS One17(2022)

work page 2022

[2] [2]

In: Agent-based Spatial Simulation with NetLogo, pp

Banos, A., Caillou, P., Gaudou, B., Marilleau, N.: Agent-based model exploration. In: Agent-based Spatial Simulation with NetLogo, pp. 125–181. Elsevier (2015)

work page 2015

[3] [3]

SAR and QSAR in Environmental Re- search17(3), 337–352 (2006)

Chen, J.J., Tsai, C.A., Moon, H., Ahn, H., Young, J.J., Chen, C.H.: Decision threshold adjustment in class prediction. SAR and QSAR in Environmental Re- search17(3), 337–352 (2006)

work page 2006

[4] [4]

In: MABS (2023)

De Bosscher, B., Ziabari, S.S.M., Sharpanskykh, A.: Towards a better understand- ing of agent-based airport terminal operations using surrogate modeling. In: MABS (2023)

work page 2023

[5] [5]

journal of Computational and Graphical Statistics24(1), 44–65 (2015)

Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics24(1), 44–65 (2015)

work page 2015

[6] [6]

Goodfellow, I.: Deep learning (2016)

work page 2016

[7] [7]

In: Individual- based modeling and ecology

Grimm, V., Railsback, S.F.: Individual-based modeling and ecology. In: Individual- based modeling and ecology. Princeton university press (2013)

work page 2013

[8] [8]

Machine learning110(3), 457– 506 (2021)

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning110(3), 457– 506 (2021)

work page 2021

[9] [9]

Uncer- tainty management in simulation-optimization of complex systems: algorithms and applications pp

Iooss, B., Lemaître, P.: A review on global sensitivity analysis methods. Uncer- tainty management in simulation-optimization of complex systems: algorithms and applications pp. 101–122 (2015)

work page 2015

[10] [10]

Lamperti, F., Mandel, A., Napoletano, M., Sapio, A., Roventini, A., Balint, T., Khorenzhenko, I.: Towards agent-based integrated assessment models: examples, challenges, and future developments. Reg. Env. Change19(3), 747–762 (2019)

work page 2019

[11] [11]

In: Pro- ceedings of AAMAS 2026

Mastio, M., Saves, P., Gaudou, B., Verstaevel, N.: Adaptive agents in spatial double-auction markets: Modeling the emergence of industrial symbiosis. In: Pro- ceedings of AAMAS 2026. vol. 2026, pp. 1–10. IFAAMAS (2026)

work page 2026

[12] [12]

John wiley & sons (2017)

Montgomery, D.C.: Design and analysis of experiments. John wiley & sons (2017)

work page 2017

[13] [13]

Novak, M., Wilensky, U.: NetLogo Bug Hunt Predators and Invasive Species (2011)

work page 2011

[14] [14]

Princeton university press (2019)

Railsback, S.F., Grimm, V.: Agent-based and individual-based modeling: a practi- cal introduction. Princeton university press (2019)

work page 2019

[15] [15]

Structural and Multidisciplinary Optimization (2026) Automated Exploration of Parameters for Agent-Based Simulation 13

Saves, P., Hallé-Hannan, E., Bussemaker, J., Diouane, Y., Bartoli, N.: Modeling hi- erarchical spaces: A review and unified framework for surrogate-based architecture design. Structural and Multidisciplinary Optimization (2026) Automated Exploration of Parameters for Agent-Based Simulation 13

work page 2026

[16] [16]

arXiv preprint arXiv:2510.16742;2025 (2025)

Saves, P., Palar, P.S., Robani, M.D., Verstaevel, N., Garouani, M., Aligon, J., Gau- dou, B., Shimoyama, K., Morlier, J.: Surrogate modeling and explainable artificial intelligence for complex systems: A workflow for automated simulation exploration. arXiv preprint arXiv:2510.16742;2025 (2025)

work page arXiv 2025

[17] [17]

Journal of Machine Learn- ing Research9(3) (2008)

Shafer, G., Vovk, V.: A tutorial on conformal prediction. Journal of Machine Learn- ing Research9(3) (2008)

work page 2008

[18] [18]

Journal of Artificial Societies and Social Simulation17(3), 11 (2014)

Thiele, J.C., Kurth, W., Grimm, V.: Facilitating parameter estimation and sensi- tivity analysis of agent-based models: A cookbook using NetLogo and R. Journal of Artificial Societies and Social Simulation17(3), 11 (2014)

work page 2014

[19] [19]

In: Proc

Tumer, K., Ghosh, J.: Estimating the bayes error rate through classifier combining. In: Proc. of ICPR ’96. vol. 2, pp. 695–699. IEEE (1996)

work page 1996

[20] [20]

Metabolic Trap

Wilensky, U.: Netlogo: Center for connected learning and computer-based model- ing. Northwestern University, Evanston, IL4952(1999) 14 P. Saves, M. Mastio, N. Verstaevel, B. Gaudou Supplementary Materials: Generated Figures Fig.3: Simulation results showing the relationship between variables X and Y. (a) Initial Exploratory Batch (N= 3,250) (b) Refined Fo...

work page 1999