Why Model Credibility Isn't Enough: -Rethinking Trust in Simulation Architectures

Adeline Lanugue; Anthony Levillain; Boussaad Soualmi; Cedric Leclerc; Cristian Maxim; Julien Silande; Maxime Hayet; Rim Kaddah; Romain Barbedienne

arxiv: 2606.17593 · v1 · pith:5UNVNSGUnew · submitted 2026-06-16 · 💻 cs.SE

Why Model Credibility Isn't Enough: -Rethinking Trust in Simulation Architectures

Romain Barbedienne , Adeline Lanugue , Rim Kaddah , Julien Silande , Anthony Levillain , Cedric Leclerc , Maxime Hayet , Boussaad Soualmi

show 1 more author

Cristian Maxim

This is my paper

Pith reviewed 2026-06-26 23:58 UTC · model grok-4.3

classification 💻 cs.SE

keywords simulation credibilityassembly credibilitysensitivity analysisexpert analysisexplainable AInetwork analysissimulation architecturestrust assessment

0 comments

The pith

The credibility of a simulation architecture cannot be assessed from the credibilities of its component models alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper questions if the credibility of assembled simulation architectures follows directly from the credibility of their individual models. It offers an overview of state-of-the-art methods for assessing assembly credibility by comparing sensitivity analysis, expert qualitative analysis, AI explainability, and network approaches. These methods are then evaluated on rigor, generalization, and resource requirements to show their strengths and weaknesses. A reader would care because simulations in practice are rarely standalone, and incorrect assumptions about assembled credibility could lead to flawed decisions in critical applications.

Core claim

The paper claims that the credibility of a simulation architecture cannot necessarily be assessed based on the credibility of the models that comprise it. To address this, it provides an overview of the current state of the art in assembly credibility by comparing sensitivity analysis techniques, qualitative analysis by experts, explainability in AI, and networks, and assesses the proposed approaches based on criteria such as rigor, generalization, and resource requirements to reveal their strengths and weaknesses.

What carries the argument

Comparison of four assembly credibility assessment approaches (sensitivity analysis, expert qualitative analysis, AI explainability, and networks) against rigor, generalization, and resource requirements.

If this is right

Each approach exhibits unique strengths and weaknesses in evaluating assembly credibility.
Sensitivity analysis offers quantitative measures but demands high computational effort.
Expert analysis provides contextual insight but may lack formal rigor.
AI explainability techniques can scale to complex systems but vary in generalization.
Network methods highlight inter-model dependencies but require further validation for credibility purposes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This finding implies that current verification practices focused on single models may need to be extended to system-level interactions in simulations.
Practitioners could benefit from combining multiple approaches to cover different aspects of assembly credibility.
The review points toward the development of standardized frameworks for architecture-level credibility assessment in simulation engineering.

Load-bearing premise

The four approaches (sensitivity analysis, expert qualitative analysis, explainability in AI, and networks) form a representative and sufficient basis for reviewing the state of the art in assembly credibility assessment.

What would settle it

A counterexample where a simulation architecture shows high credibility under all four reviewed methods but produces unreliable results when the full assembly is executed against known ground truth.

read the original abstract

Credibility of a simulation model is an important topic. Several approaches try to quantify the credibility of simulation. However, models are mostly assembled within a simulation architecture. Can the credibility of a simulation architecture be assessed based on the credibility of the models that comprise it? This paper aims to address this issue by providing an overview of the current state of the art in the field of assembly credibility. It will compare sensitivity analysis techniques, qualitative analysis by experts, explainability in AI, and networks. Finally, an assessment of the proposed approaches, based on criteria such as rigor, generalization, and resource requirements, will reveal the strengths and weaknesses of each approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey paper arguing model credibility does not automatically extend to assembled simulation architectures, with a comparison of four assessment methods that may be too narrow.

read the letter

Colleague,

The main thing to know is that the paper claims you cannot assess a simulation architecture's credibility just from the credibilities of its component models, and it sets up a comparison of four approaches—sensitivity analysis, expert qualitative analysis, AI explainability, and networks—on criteria like rigor, generalization, and resource requirements.

What it does reasonably well is frame the distinction between model-level and architecture-level trust in plain terms and pick comparison criteria that matter for engineering use. That gives a clear structure for thinking about when separate assembly checks are needed.

The soft spot is the choice of those four approaches. They may not cover established techniques such as compositional verification, interface contract checking, or statistical model checking for coupled models. If those are missing, the assessment of strengths and weaknesses cannot securely show that dedicated architecture methods are required. The paper is only an overview with no new data or derivations, so its value rests entirely on whether the literature selection is balanced.

This is for engineers or researchers working on integrated simulations in domains like automotive or systems engineering. A reader looking for a quick map of current options might find it useful as background, but it is not the kind of work that changes practice on its own.

I would send it for peer review if the full text demonstrates a thorough, unbiased survey; the core question is worth airing. Otherwise it risks being too preliminary.

Referee Report

2 major / 1 minor

Summary. The paper claims that the credibility of a simulation architecture cannot necessarily be assessed from the credibility of its constituent models. It provides an overview of the state of the art in assembly credibility assessment by surveying and comparing four approaches—sensitivity analysis techniques, qualitative analysis by experts, explainability in AI, and networks—on criteria including rigor, generalization, and resource requirements, with the goal of revealing strengths and weaknesses of each.

Significance. If the survey and comparison are comprehensive and unbiased, the work would usefully synthesize methods for architecture-level credibility in simulation engineering, a topic of growing importance for complex assembled systems. The explicit multi-criteria evaluation framework could serve as a reference point for practitioners and researchers seeking to move beyond model-level assessments.

major comments (2)

[Abstract] Abstract and planned comparison section: The manuscript positions the four listed approaches (sensitivity analysis, expert qualitative analysis, explainability in AI, and networks) as the basis for assessing the state of the art in assembly credibility. Without an explicit justification or literature search protocol showing that other established techniques (e.g., compositional verification, interface contract checking, or statistical model checking for coupled models) were considered and excluded on principled grounds, the claim that architecture-level assessment is not reducible to model-level credibility rests on an incomplete foundation and cannot securely support the cross-approach comparison of rigor and generalization.
[Abstract] The central claim that dedicated assembly methods are required would be strengthened by at least one concrete example in which model-level credibility metrics demonstrably fail to capture architecture-level issues (e.g., emergent interface inconsistencies or propagation of uncertainty across couplings). No such falsifiable illustration is referenced in the provided abstract or overview.

minor comments (1)

[Abstract] The abstract uses future tense ('It will compare', 'will reveal') which is appropriate for a proposal but should be updated to present tense once the comparisons are executed in the full manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the manuscript's foundation and illustrative power. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and planned comparison section: The manuscript positions the four listed approaches (sensitivity analysis, expert qualitative analysis, explainability in AI, and networks) as the basis for assessing the state of the art in assembly credibility. Without an explicit justification or literature search protocol showing that other established techniques (e.g., compositional verification, interface contract checking, or statistical model checking for coupled models) were considered and excluded on principled grounds, the claim that architecture-level assessment is not reducible to model-level credibility rests on an incomplete foundation and cannot securely support the cross-approach comparison of rigor and generalization.

Authors: We agree that an explicit literature search protocol and inclusion/exclusion rationale would strengthen the positioning of the four approaches. The current manuscript presents them as representative methods drawn from the simulation credibility literature, but does not detail the search process. In revision we will add a dedicated subsection (likely in the introduction or methods overview) describing the search strategy, databases used, and criteria that led to focusing on sensitivity analysis, expert judgment, AI explainability, and network techniques. We will also briefly note why methods such as compositional verification and statistical model checking were set aside (primarily because they target formal correctness rather than the broader credibility assessment criteria of rigor, generalization, and resource use that structure our comparison). This addition will make the scope of the survey transparent without altering the core contribution. revision: yes
Referee: [Abstract] The central claim that dedicated assembly methods are required would be strengthened by at least one concrete example in which model-level credibility metrics demonstrably fail to capture architecture-level issues (e.g., emergent interface inconsistencies or propagation of uncertainty across couplings). No such falsifiable illustration is referenced in the provided abstract or overview.

Authors: We concur that a concrete, falsifiable example would make the central claim more compelling. Although the manuscript emphasizes the conceptual distinction and the comparative evaluation of assembly-level methods, it does not currently include a worked example of model-level metrics failing at the architecture level. In the revised version we will insert a short illustrative case (drawn from published coupled-simulation studies) early in the introduction, showing how interface inconsistencies or uncertainty propagation can produce architecture-level credibility problems that remain invisible when only individual-model assessments are performed. This example will be referenced again when discussing the limitations of purely model-centric approaches. revision: yes

Circularity Check

0 steps flagged

No circularity: review paper with no derivations or fitted predictions

full rationale

The paper is a survey/overview that compares four existing approaches (sensitivity analysis, expert qualitative analysis, explainability in AI, and networks) for assessing assembly credibility. It contains no equations, no fitted parameters, no predictions derived from inputs, and no self-citation chains that bear the central claim. The central claim—that architecture credibility is not necessarily reducible to model credibility—is supported by the external survey content itself rather than by any internal reduction to the paper's own assumptions or prior outputs. This matches the default expectation for non-circular review papers; the representativeness of the four approaches is a scope limitation, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper; it introduces no free parameters, mathematical axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5664 in / 954 out tokens · 34496 ms · 2026-06-26T23:58:11.399431+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 10 canonical work pages · 1 internal anchor

[1]

IRT SystemX, 2 Bd Thomas Gobert, 91120 Palaiseau, France (E-mail: romain.barbedienne@irt-systemx.fr)
[2]

Keysight Technology, La janais 3 rue Pierre et marie curie, 35131 Chartres-de-Bretagne, France
[3]

OPmobility Alphatech, ZAC du Bois de Plaisance, 214 Av. de la Mare Gessart, 60280 Venette, France 4)Renault Technocentre, 1 avenue du Golf, 78280 Guyancourt, France 5)Stellantis Green-Campus, 43 Rue Jean Pierre Timbaud, 78300 Poissy, France ABSTRACT: Credibility of a simulation model is an important topic. Several approaches try to quantify the credibilit...
[4]

France 2030

takes a different approach: parameters are grouped and tested in aggregate, with groups progressively subdivided until individual influential parameters are isolated. It is both effective and efficient, requiring relatively few simulations run, but assumes that the model response can be approximated by a first - order polynomial and that the signs of para...

1991
[5]

Simulation-based multi-organization engineering : Simulation specification,

Romain Barbedienne, Julien Silande, Henri Sohier, Anthony Levillain, Cédric Leclerc, and Maxime Hayet, “Simulation-based multi-organization engineering : Simulation specification,” presented at the SIA simulation numérique, guyancourt, Apr. 2025

2025
[6]

Standard for models and simulation, NASA STD 7009B,
[7]

Available: https://standards.nasa.gov/sites/default/files/standards/NAS A/B/1/NASA-STD-7009B-Final-3-5-2024.pdf

[Online]. Available: https://standards.nasa.gov/sites/default/files/standards/NAS A/B/1/NASA-STD-7009B-Final-3-5-2024.pdf

2024
[8]

Predictive Capability Maturity Model for computational modeling and simulation.,

W. Oberkampf, T. Trucano, and M. Pilch, “Predictive Capability Maturity Model for computational modeling and simulation.,” SAND2007-5948, 976951, Oct. 2007. doi: 10.2172/976951

work page doi:10.2172/976951 2007
[9]

Guard Rails for ‘Simulation Credibility Standards and Recommendation,’

Prostep ivip, “Guard Rails for ‘Simulation Credibility Standards and Recommendation,’” white paper Version 1, Mar. 2024. Accessed: Apr. 05, 2026. [Online]. Available: https://www.prostep.org/fileadmin/prod- download/TechnicalPaper_Simulation- Credibility_2024_V8.1_v2.pdf

2024
[10]

Simulation-based multi- organization engineering,

Romain Barbedienne et al., “Simulation-based multi- organization engineering,” IRT SystemX, White paper, Apr. 2025

2025
[11]

European commission, laying down rules for the application of Regulation (EU) 2019/2144 of the European Parliament and of the Council as regards uniform procedures and technical specifications for the type- approval of the automated driving system (ADS) of fully automated vehicles, vol. L 221/1. 2022. Accessed: Apr. 05,

2019
[12]

Available: https://eur-lex.europa.eu/legal- content/EN/TXT/PDF/?uri=CELEX:32022R1426

[Online]. Available: https://eur-lex.europa.eu/legal- content/EN/TXT/PDF/?uri=CELEX:32022R1426
[13]

Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models,

M. Paruggia, “Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models,” J. Am. Stat. Assoc., vol. 101, no. 473, pp. 398–399, 2006, doi: 10.1198/jasa.2006.s80

work page doi:10.1198/jasa.2006.s80 2006
[14]

Cacuci, Sensitivity and Uncertainty Analysis, Volume I: Theory, vol

D. Cacuci, Sensitivity and Uncertainty Analysis, Volume I: Theory, vol. 1. 2003. doi: 10.1201/9780203498798

work page doi:10.1201/9780203498798 2003
[15]

Factorial Sampling Plans for Preliminary Computational Experiments,

M. D. Morris, “Factorial Sampling Plans for Preliminary Computational Experiments,” Technometrics, vol. 33, no. 2, pp. 161–174, 1991, doi: 10.1080/00401706.1991.10484804

work page doi:10.1080/00401706.1991.10484804 1991
[16]

Analysis and optimization of an air-launch-to-orbit separation,

H. Sohier, H. Piet-Lahanier, and J.-L. Farges, “Analysis and optimization of an air-launch-to-orbit separation,” Acta Astronaut., vol. 108, pp. 18–29, 2015, doi: https://doi.org/10.1016/j.actaastro.2014.11.043

work page doi:10.1016/j.actaastro.2014.11.043 2015
[17]

How to avoid a perfunctory sensitivity analysis,

A. Saltelli and P. Annoni, “How to avoid a perfunctory sensitivity analysis,” Environ. Model. Softw., vol. 25, no. 12, pp. 1508–1517, 2010, doi: https://doi.org/10.1016/j.envsoft.2010.04.012

work page doi:10.1016/j.envsoft.2010.04.012 2010
[18]

Searching for important factors in simulation models with many factors: Sequential bifurcation,

B. Bettonvil and J. P. C. Kleijnen, “Searching for important factors in simulation models with many factors: Sequential bifurcation,” Eur. J. Oper. Res., vol. 96, no. 1, pp. 180–194, 1997, doi: https://doi.org/10.1016/S0377-2217(96)00156-7

work page doi:10.1016/s0377-2217(96)00156-7 1997
[19]

Methods Based on Decomposing the Variance of the Output,

Andrea Saltelli, Stefano Tarantola, Francesca Campolongo, and Marco Ratto, “Methods Based on Decomposing the Variance of the Output,” in Sensitivity Analysis in Practice, John Wiley & Sons, Ltd, 2002, pp. 109–149. doi: https://doi.org/10.1002/0470870958.ch5

work page doi:10.1002/0470870958.ch5 2002
[20]

KRIGING METAMODELING IN SIMULATION: A REVIEW

J. P. C. Kleijnen, “KRIGING METAMODELING IN SIMULATION: A REVIEW”
[21]

Granger causality: A review and recent advances,

A. Shojaie and E. B. Fox, “Granger causality: A review and recent advances,” Annu. Rev. Stat. Its Appl., vol. 9, pp. 289–319, 2022

2022
[22]

Regression Shrinkage and Selection Via the Lasso,

R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 58, no. 1, pp. 267–288, Jan. 1996, doi: 10.1111/j.2517- 6161.1996.tb02080.x

work page doi:10.1111/j.2517- 1996
[23]

Ablation Studies in Artificial Neural Networks

R. Meyes, M. Lu, C. W. de Puiseau, and T. Meisen, “Ablation Studies in Artificial Neural Networks,” Feb. 18, 2019, arXiv: arXiv:1901.08644. doi: 10.48550/arXiv.1901.08644

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1901.08644 2019

[1] [1]

IRT SystemX, 2 Bd Thomas Gobert, 91120 Palaiseau, France (E-mail: romain.barbedienne@irt-systemx.fr)

[2] [2]

Keysight Technology, La janais 3 rue Pierre et marie curie, 35131 Chartres-de-Bretagne, France

[3] [3]

OPmobility Alphatech, ZAC du Bois de Plaisance, 214 Av. de la Mare Gessart, 60280 Venette, France 4)Renault Technocentre, 1 avenue du Golf, 78280 Guyancourt, France 5)Stellantis Green-Campus, 43 Rue Jean Pierre Timbaud, 78300 Poissy, France ABSTRACT: Credibility of a simulation model is an important topic. Several approaches try to quantify the credibilit...

[4] [4]

France 2030

takes a different approach: parameters are grouped and tested in aggregate, with groups progressively subdivided until individual influential parameters are isolated. It is both effective and efficient, requiring relatively few simulations run, but assumes that the model response can be approximated by a first - order polynomial and that the signs of para...

1991

[5] [5]

Simulation-based multi-organization engineering : Simulation specification,

Romain Barbedienne, Julien Silande, Henri Sohier, Anthony Levillain, Cédric Leclerc, and Maxime Hayet, “Simulation-based multi-organization engineering : Simulation specification,” presented at the SIA simulation numérique, guyancourt, Apr. 2025

2025

[6] [6]

Standard for models and simulation, NASA STD 7009B,

[7] [7]

Available: https://standards.nasa.gov/sites/default/files/standards/NAS A/B/1/NASA-STD-7009B-Final-3-5-2024.pdf

[Online]. Available: https://standards.nasa.gov/sites/default/files/standards/NAS A/B/1/NASA-STD-7009B-Final-3-5-2024.pdf

2024

[8] [8]

Predictive Capability Maturity Model for computational modeling and simulation.,

W. Oberkampf, T. Trucano, and M. Pilch, “Predictive Capability Maturity Model for computational modeling and simulation.,” SAND2007-5948, 976951, Oct. 2007. doi: 10.2172/976951

work page doi:10.2172/976951 2007

[9] [9]

Guard Rails for ‘Simulation Credibility Standards and Recommendation,’

Prostep ivip, “Guard Rails for ‘Simulation Credibility Standards and Recommendation,’” white paper Version 1, Mar. 2024. Accessed: Apr. 05, 2026. [Online]. Available: https://www.prostep.org/fileadmin/prod- download/TechnicalPaper_Simulation- Credibility_2024_V8.1_v2.pdf

2024

[10] [10]

Simulation-based multi- organization engineering,

Romain Barbedienne et al., “Simulation-based multi- organization engineering,” IRT SystemX, White paper, Apr. 2025

2025

[11] [11]

European commission, laying down rules for the application of Regulation (EU) 2019/2144 of the European Parliament and of the Council as regards uniform procedures and technical specifications for the type- approval of the automated driving system (ADS) of fully automated vehicles, vol. L 221/1. 2022. Accessed: Apr. 05,

2019

[12] [12]

Available: https://eur-lex.europa.eu/legal- content/EN/TXT/PDF/?uri=CELEX:32022R1426

[Online]. Available: https://eur-lex.europa.eu/legal- content/EN/TXT/PDF/?uri=CELEX:32022R1426

[13] [13]

Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models,

M. Paruggia, “Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models,” J. Am. Stat. Assoc., vol. 101, no. 473, pp. 398–399, 2006, doi: 10.1198/jasa.2006.s80

work page doi:10.1198/jasa.2006.s80 2006

[14] [14]

Cacuci, Sensitivity and Uncertainty Analysis, Volume I: Theory, vol

D. Cacuci, Sensitivity and Uncertainty Analysis, Volume I: Theory, vol. 1. 2003. doi: 10.1201/9780203498798

work page doi:10.1201/9780203498798 2003

[15] [15]

Factorial Sampling Plans for Preliminary Computational Experiments,

M. D. Morris, “Factorial Sampling Plans for Preliminary Computational Experiments,” Technometrics, vol. 33, no. 2, pp. 161–174, 1991, doi: 10.1080/00401706.1991.10484804

work page doi:10.1080/00401706.1991.10484804 1991

[16] [16]

Analysis and optimization of an air-launch-to-orbit separation,

H. Sohier, H. Piet-Lahanier, and J.-L. Farges, “Analysis and optimization of an air-launch-to-orbit separation,” Acta Astronaut., vol. 108, pp. 18–29, 2015, doi: https://doi.org/10.1016/j.actaastro.2014.11.043

work page doi:10.1016/j.actaastro.2014.11.043 2015

[17] [17]

How to avoid a perfunctory sensitivity analysis,

A. Saltelli and P. Annoni, “How to avoid a perfunctory sensitivity analysis,” Environ. Model. Softw., vol. 25, no. 12, pp. 1508–1517, 2010, doi: https://doi.org/10.1016/j.envsoft.2010.04.012

work page doi:10.1016/j.envsoft.2010.04.012 2010

[18] [18]

Searching for important factors in simulation models with many factors: Sequential bifurcation,

B. Bettonvil and J. P. C. Kleijnen, “Searching for important factors in simulation models with many factors: Sequential bifurcation,” Eur. J. Oper. Res., vol. 96, no. 1, pp. 180–194, 1997, doi: https://doi.org/10.1016/S0377-2217(96)00156-7

work page doi:10.1016/s0377-2217(96)00156-7 1997

[19] [19]

Methods Based on Decomposing the Variance of the Output,

Andrea Saltelli, Stefano Tarantola, Francesca Campolongo, and Marco Ratto, “Methods Based on Decomposing the Variance of the Output,” in Sensitivity Analysis in Practice, John Wiley & Sons, Ltd, 2002, pp. 109–149. doi: https://doi.org/10.1002/0470870958.ch5

work page doi:10.1002/0470870958.ch5 2002

[20] [20]

KRIGING METAMODELING IN SIMULATION: A REVIEW

J. P. C. Kleijnen, “KRIGING METAMODELING IN SIMULATION: A REVIEW”

[21] [21]

Granger causality: A review and recent advances,

A. Shojaie and E. B. Fox, “Granger causality: A review and recent advances,” Annu. Rev. Stat. Its Appl., vol. 9, pp. 289–319, 2022

2022

[22] [22]

Regression Shrinkage and Selection Via the Lasso,

R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 58, no. 1, pp. 267–288, Jan. 1996, doi: 10.1111/j.2517- 6161.1996.tb02080.x

work page doi:10.1111/j.2517- 1996

[23] [23]

Ablation Studies in Artificial Neural Networks

R. Meyes, M. Lu, C. W. de Puiseau, and T. Meisen, “Ablation Studies in Artificial Neural Networks,” Feb. 18, 2019, arXiv: arXiv:1901.08644. doi: 10.48550/arXiv.1901.08644

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1901.08644 2019