HAVE: Host Active Verification Engine for Closing the Contextual Reality Gap in Security Digital Twins
Pith reviewed 2026-06-27 21:52 UTC · model grok-4.3
The pith
A safety-constrained host agent measures empirical compromise probabilities to correct CVSS-based risk estimates in security digital twins.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deploying a safety-constrained host agent to perform snapshot-isolated Bernoulli trials yields an empirical compromise probability that is propagated via a Bayesian blending rule into Monte Carlo simulations, thereby correcting the contextual reality gap between CVSS-derived probabilities and observed system behavior.
What carries the argument
The Host Active Verification Engine (HAVE) deploys a safety-constrained host agent that measures empirical compromise probability through maximum-likelihood estimation over snapshot-isolated Bernoulli trials and weights the result with a Wilson interval for Bayesian blending into Monte Carlo risk simulations.
If this is right
- P_reach is reduced 38.2 percent in false-positive scenarios and increased 132.4 percent in false-negative scenarios, producing a net 124.1 percent correction.
- Post-HAVE probability estimates vary by only a factor of 1.12 across different calibration exponents, compared with 4.6 for CVSS-only baselines.
- The same blending rule applies uniformly across four vulnerability classes and three security tiers.
Where Pith is reading between the lines
- The approach could be extended to continuously running agents that update estimates in real time rather than at snapshot intervals.
- If the host-agent overhead remains low, the method may allow digital twins to incorporate live telemetry from production fleets without separate test environments.
- The Beta-Binomial connection suggests the framework could be generalized to other probabilistic security models that currently use static scores.
Load-bearing premise
The snapshot-isolated Bernoulli trials produce an unbiased estimate of true compromise probability without the measurement process itself changing the system's attack surface.
What would settle it
Run a controlled attack campaign on one of the evaluated production binaries, compare the observed fraction of successful compromises against the post-HAVE P_reach value, and check whether the difference exceeds the reported Wilson-interval width.
Figures
read the original abstract
Security Digital Twins (SDTs) provide continuously updated virtual replicas of infrastructure for threat simulation, yet they rely on theoretical CVSS scores to assign lateral-movement probabilities -- creating the Contextual Reality Gap: risk is overestimated where unacknowledged mitigations neutralize exploits, and drastically underestimated where logic flaws bypass all memory-safety defenses. We present the Host Active Verification Engine (HAVE), an SDT extension that deploys a safety-constrained host agent to measure the empirical probability of compromise $\hat{p}$ via maximum-likelihood estimation over snapshot-isolated Bernoulli trials. A Wilson interval-width confidence weight $\alpha_w$ propagates $\hat{p}$ into Monte Carlo simulations via a Bayesian blending rule formally related to the Beta-Binomial posterior. Evaluation across four vulnerability classes, three security tiers, and two production binaries shows HAVE reduces $P_{\text{reach}}$ by 38.2% in false-positive scenarios and increases it by 132.4% in false-negative scenarios, with a net +124.1% correction; post-HAVE estimates vary by only $1.12\times$ across calibration exponents $\kappa$, versus $4.6\times$ for CVSS-only baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Host Active Verification Engine (HAVE) as an extension to Security Digital Twins. It deploys a safety-constrained host agent to perform snapshot-isolated Bernoulli trials, yielding a maximum-likelihood estimate ρ̂ of compromise probability. This estimate is blended with CVSS scores via a Wilson interval-weighted rule formally related to the Beta-Binomial posterior and propagated into Monte Carlo lateral-movement simulations. Evaluation across four vulnerability classes, three security tiers, and two production binaries reports that HAVE corrects P_reach by 38.2% (false-positive scenarios) and 132.4% (false-negative scenarios) for a net +124.1% adjustment, while post-HAVE P_reach values vary by only 1.12 imes across calibration exponents κ versus 4.6 imes for CVSS-only baselines.
Significance. If the measurement assumption holds, HAVE supplies a concrete mechanism for closing the contextual reality gap in SDTs by replacing purely theoretical probabilities with empirical estimates. The reported stability across κ and the explicit link to Beta-Binomial updating constitute methodological strengths that could support more reliable threat simulation in production settings.
major comments (1)
- [Methods / Evaluation] Methods / Evaluation sections: The claim that snapshot-isolated Bernoulli trials produce an unbiased ρ̂ that can be directly propagated into the Monte Carlo model rests on the unverified assertion that the safety-constrained host agent leaves the original attack surface unchanged. No before-versus-after comparison of exploitable states, memory layout, or new code paths is reported for the two production binaries. This is load-bearing for the validity of the Wilson-weighted Beta-Binomial blending rule and the headline correction percentages.
minor comments (1)
- [Abstract] Abstract: The net +124.1% correction is stated without showing how it is derived from the separate 38.2% and 132.4% figures; a one-sentence clarification would aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for highlighting the importance of validating the safety constraints on the host agent. We address the major comment below.
read point-by-point responses
-
Referee: [Methods / Evaluation] Methods / Evaluation sections: The claim that snapshot-isolated Bernoulli trials produce an unbiased ρ̂ that can be directly propagated into the Monte Carlo model rests on the unverified assertion that the safety-constrained host agent leaves the original attack surface unchanged. No before-versus-after comparison of exploitable states, memory layout, or new code paths is reported for the two production binaries. This is load-bearing for the validity of the Wilson-weighted Beta-Binomial blending rule and the headline correction percentages.
Authors: We agree that the manuscript does not report an explicit before-versus-after comparison of exploitable states, memory layout, or code paths for the two production binaries, and that this leaves the assertion about an unchanged attack surface unverified in the current text. The safety constraints are described as limiting the agent to non-modifying verification actions with snapshot isolation to restore state, but no empirical confirmation of invariance is provided. To address this, we will revise the Methods section to add a dedicated subsection that (i) enumerates the precise constraints enforced on the agent, (ii) reports the results of a before/after static and dynamic analysis (e.g., diff of memory maps, symbol tables, and reachable exploit paths) performed on the binaries, and (iii) discusses any residual limitations. This will directly support the validity of the Wilson-weighted blending rule and the reported correction percentages. revision: yes
Circularity Check
No significant circularity; derivation relies on external empirical measurement and standard Bayesian update
full rationale
The paper defines HAVE as deploying agents to obtain an independent empirical ρ̂ via MLE on Bernoulli trials, then applies a Wilson-weighted blend formally related to the Beta-Binomial posterior before feeding the result into Monte Carlo P_reach computation. The reported percentage corrections and 1.12 imes stability across κ are computed outcomes of this pipeline on the evaluated binaries, not redefinitions or renamings of the input measurements themselves. No equation reduces the final P_reach to a fitted parameter by construction, no self-citation chain bears the central claim, and the measurement assumption is stated separately from the update rule. The derivation therefore remains non-circular.
Axiom & Free-Parameter Ledger
free parameters (1)
- calibration exponent κ
axioms (1)
- domain assumption Snapshot-isolated Bernoulli trials yield an unbiased estimate of the true compromise probability that can be safely propagated via Wilson-weighted Bayesian blending.
Reference graph
Works this paper leans on
-
[1]
NIST special publication 800-82 rev. 2: Guide to industrial control systems (ICS) security,
K. Stoufferet al., “NIST special publication 800-82 rev. 2: Guide to industrial control systems (ICS) security,” National Institute of Standards and Technology, Tech. Rep., 2015
2015
-
[2]
Orpheus: Enforcing cyber- physical execution semantics to defend against data-oriented at- tacks,
L. Cheng, K. Tian, and D. D. Yao, “Orpheus: Enforcing cyber- physical execution semantics to defend against data-oriented at- tacks,” pp. 315–326, 2017
2017
-
[3]
Notline: A non- intrusive automated platform to build a digital twin,
F. Baiardi, V . Sammartino, and S. Ruggieri, “Notline: A non- intrusive automated platform to build a digital twin,” in2025 29th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), 2025, pp. 1–8
2025
-
[4]
Digital twin: Enabling technologies, challenges and open research,
A. Fulleret al., “Digital twin: Enabling technologies, challenges and open research,”IEEE Access, vol. 8, pp. 108 952–108 971, 2020
2020
-
[5]
A framework for proactive cyber-resilience: Non- intrusive modeling for autonomous defense,
V . Sammartino, “A framework for proactive cyber-resilience: Non- intrusive modeling for autonomous defense,” inDS-RT 2025, 2025
2025
-
[6]
What you corrupt is not what you crash: Challenges in fuzzing embedded devices,
M. Muenchet al., “What you corrupt is not what you crash: Challenges in fuzzing embedded devices,” inProc. Network and Distributed System Security Symp. (NDSS), 2018
2018
-
[7]
Common vulnerability scoring system v3.1: Specifica- tion document,
FIRST.org, “Common vulnerability scoring system v3.1: Specifica- tion document,” Forum of Incident Response and Security Teams (FIRST), Tech. Rep., 2019
2019
-
[8]
Simulation-powered cybersecurity: Real-time risk assessment via non-intrusive security twin,
F. Baiardi and V . Sammartino, “Simulation-powered cybersecurity: Real-time risk assessment via non-intrusive security twin,”The Journal of Supercomputing, 2026, special Issue: Simulation-Powered Innovation: Driving the Future of Digital Ecosystems
2026
-
[9]
SoK: Eternal war in memory,
L. Szekereset al., “SoK: Eternal war in memory,” inProc. IEEE Symp. Security and Privacy (S&P), 2013, pp. 48–62
2013
-
[10]
Security twins e il futuro della previsione di intrusioni cyber,
F. Baiardi, S. Ruggieri, and V . Sammartino, “Security twins e il futuro della previsione di intrusioni cyber,”ICT Security, 2025
2025
-
[11]
A specification-based state replica- tion approach for digital twins,
M. Eckhart and A. Ekelhart, “A specification-based state replica- tion approach for digital twins,” inProc. ACM Workshop Cyber- Physical Systems Security and Privacy (CPS-SPC), 2018, pp. 36–47
2018
-
[12]
Integrating digital twin security simu- lations in the security operations center,
M. Dietz and G. Pernul, “Integrating digital twin security simu- lations in the security operations center,”IEEE Access, vol. 8, pp. 163 252–163 268, 2020
2020
-
[13]
From digital twins to ai agents: A synthetic data paradigm for next-generation cybersecurity,
F. Baiardi and V . Sammartino, “From digital twins to ai agents: A synthetic data paradigm for next-generation cybersecurity,” in Artificial Intelligence in Cybersecurity: Unlocking the Power of Large Language Models. CRC Press, 2026
2026
-
[14]
Quantifying resilience of cyber-physical systems to zero- day threats: A security twin-based what-if analysis framework,
——, “Quantifying resilience of cyber-physical systems to zero- day threats: A security twin-based what-if analysis framework,” inProceedings of the 36th European Safety and Reliability Conference (ESREL 2026). Braga, Portugal: European Safety and Reliability Association (ESRA), June 2026
2026
-
[15]
A Security Twin to Defeat Intrusions in Cyber Physical Systems,
V . Sammartino, F. Baiardi, and S. Ruggieri, “A Security Twin to Defeat Intrusions in Cyber Physical Systems,” inESREL SRA-E 2025, 2025
2025
-
[16]
Anticipating Disasters through a Security Twin,
F. Baiardi, S. Ruggieri, and V . Sammartino, “Anticipating Disasters through a Security Twin,” inSPRINGER OPTIMIZATION AND ITS APPLICATIONS - ARES 2024, 2024
2024
-
[17]
Beyond heuristics: Learning to classify vulner- abilities and predict exploits,
M. Bozorgiet al., “Beyond heuristics: Learning to classify vulner- abilities and predict exploits,” inProc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2010, pp. 105–114
2010
-
[18]
Exploit prediction scoring system (EPSS),
J. Jacobset al., “Exploit prediction scoring system (EPSS),”Digital Threats: Research and Practice, vol. 2, no. 3, pp. 1–17, 2021
2021
-
[19]
On the effectiveness of address-space random- ization,
H. Shachamet al., “On the effectiveness of address-space random- ization,” inProc. ACM Conf. Computer and Communications Security (CCS), 2004, pp. 298–307
2004
-
[20]
StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks,
C. Cowanet al., “StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks,” inProc. USENIX Security Symp., 1998, pp. 63–78
1998
-
[21]
The geometry of innocent flesh on the bone: Return- into-libc without function calls (on the x86),
H. Shacham, “The geometry of innocent flesh on the bone: Return- into-libc without function calls (on the x86),” inProc. ACM Conf. Computer and Communications Security (CCS), 2007, pp. 552–561
2007
-
[22]
AI-enabled Cyberse- curity using Synthetic Data ,
F. Baiardi, S. Ruggieri, and V . Sammartino, “ AI-enabled Cyberse- curity using Synthetic Data ,” in2025 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2025, pp. 140–145
2025
-
[23]
Nmap network scanning: The official Nmap project guide to network discovery and security scanning,
G. Lyon, “Nmap network scanning: The official Nmap project guide to network discovery and security scanning,”Insecure.com LLC, 2009
2009
-
[24]
OpenVAS: Open vulnerability assessment system,
Greenbone Networks, “OpenVAS: Open vulnerability assessment system,” 2009
2009
-
[25]
Evaluating security scanners for GNU/Linux systems: Configuration compliance and vulnerability management,
M. Moberg, J. Hallberg, and N. Hallberg, “Evaluating security scanners for GNU/Linux systems: Configuration compliance and vulnerability management,” inProc. Int’l Conf. Availability, Reliabil- ity and Security (ARES), 2014, pp. 506–513
2014
-
[26]
AEG: Automatic exploit generation,
T. Avgerinoset al., “AEG: Automatic exploit generation,” inProc. Network and Distributed System Security Symp. (NDSS), 2011
2011
-
[27]
Automatic exploit generation,
——, “Automatic exploit generation,”Communications of the ACM, vol. 57, no. 2, pp. 74–84, 2014
2014
-
[28]
SoK: (state of) the art of war: Offensive techniques in binary analysis,
Y. Shoshitaishviliet al., “SoK: (state of) the art of war: Offensive techniques in binary analysis,” inProc. IEEE Symp. Security and Privacy (S&P), 2016, pp. 138–157
2016
-
[29]
Automated generation and analysis of attack graphs,
O. Sheyneret al., “Automated generation and analysis of attack graphs,” inProc. IEEE Symp. Security and Privacy (S&P), 2002, pp. 273–284
2002
-
[30]
A scalable approach to attack graph generation,
X. Ou, W. F. Boyer, and M. A. McQueen, “A scalable approach to attack graph generation,” inProc. ACM Conf. Computer and Communications Security (CCS), 2006, pp. 336–345
2006
-
[31]
Measuring network security using Bayesian network-based attack graphs,
M. Frigault and L. Wang, “Measuring network security using Bayesian network-based attack graphs,” inProc. IEEE Int’l Com- puter Software and Applications Conf. Workshop (COMPSACW), 2008, pp. 698–703
2008
-
[32]
Dynamic security risk management using Bayesian attack graphs,
N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk management using Bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012
2012
-
[33]
Quantitative cyber risk reduction estima- tion methodology for a small SCADA control system,
M. A. McQueenet al., “Quantitative cyber risk reduction estima- tion methodology for a small SCADA control system,” 2006
2006
-
[34]
Aggregating vulnerability metrics in enter- prise networks using attack graphs,
J. Homeret al., “Aggregating vulnerability metrics in enter- prise networks using attack graphs,”Journal of Computer Security, vol. 21, no. 4, pp. 561–597, 2013
2013
-
[35]
Inception: System-wide security testing of real-world embedded systems software,
N. Corteggiani, G. Camurati, and A. Francillon, “Inception: System-wide security testing of real-world embedded systems software,” inProc. USENIX Security Symp., 2018, pp. 309–326
2018
-
[36]
Adding generic process containers to the Linux kernel,
P . B. Menage, “Adding generic process containers to the Linux kernel,” vol. 2, pp. 45–57, 2007
2007
-
[37]
MITRE ATT&CK: Design and philosophy,
B. E. Stromet al., “MITRE ATT&CK: Design and philosophy,” 2020, technical Report
2020
-
[38]
NIST special publication 800-207: Zero trust ar- chitecture,
S. Roseet al., “NIST special publication 800-207: Zero trust ar- chitecture,” National Institute of Standards and Technology, Tech. Rep., 2020
2020
-
[39]
checksec.sh – a shell script to test for common buffer overflow mitigations,
T. Kleinet al., “checksec.sh – a shell script to test for common buffer overflow mitigations,” https://github.com/slimm609/ checksec.sh, 2009
2009
-
[40]
pwntools – CTF framework and exploit development library,
F. Blichmann, M. Mazureket al., “pwntools – CTF framework and exploit development library,” https://github.com/Gallopsled/ pwntools, 2015
2015
-
[41]
A quantitative framework for the validation of twin-based cyber defense,
F. Baiardi and V . Sammartino, “A quantitative framework for the validation of twin-based cyber defense,” in37th European Modeling & Simulation Symposium (EMSS 2025), held within the 22nd Interna- tional Multidisciplinary Modeling & Simulation Multiconference (I3M 2025), 2025
2025
-
[42]
Exploiting format string vulnerabilities,
scut / team teso, “Exploiting format string vulnerabilities,” Phrack Magazine, version 1.2, 2001
2001
-
[43]
Preventing use-after-free with dangling pointers nul- lification,
B. Leeet al., “Preventing use-after-free with dangling pointers nul- lification,” inProc. Network and Distributed System Security Symp. (NDSS), 2015
2015
-
[44]
Directed greybox fuzzing,
M. B ¨ohmeet al., “Directed greybox fuzzing,” inProc. ACM Conf. Computer and Communications Security (CCS), 2017, pp. 2329–2344
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.