pith. sign in

arxiv: 2606.06968 · v1 · pith:WT5OP4OKnew · submitted 2026-06-05 · 💻 cs.CR

HAVE: Host Active Verification Engine for Closing the Contextual Reality Gap in Security Digital Twins

Pith reviewed 2026-06-27 21:52 UTC · model grok-4.3

classification 💻 cs.CR
keywords security digital twinsempirical probability estimationvulnerability assessmentMonte Carlo simulationlateral movementCVSS correctionhost agent verification
0
0 comments X

The pith

A safety-constrained host agent measures empirical compromise probabilities to correct CVSS-based risk estimates in security digital twins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that security digital twins overestimate or underestimate lateral movement risk when they rely solely on theoretical CVSS scores. HAVE closes this gap by running snapshot-isolated Bernoulli trials on the live host to obtain a maximum-likelihood estimate of actual compromise probability. This estimate is then blended into Monte Carlo simulations using a Wilson-interval weight, producing corrected reachability probabilities. Evaluation on four vulnerability classes and two production binaries demonstrates substantial corrections in both over- and under-estimation cases together with reduced sensitivity to calibration parameters.

Core claim

Deploying a safety-constrained host agent to perform snapshot-isolated Bernoulli trials yields an empirical compromise probability that is propagated via a Bayesian blending rule into Monte Carlo simulations, thereby correcting the contextual reality gap between CVSS-derived probabilities and observed system behavior.

What carries the argument

The Host Active Verification Engine (HAVE) deploys a safety-constrained host agent that measures empirical compromise probability through maximum-likelihood estimation over snapshot-isolated Bernoulli trials and weights the result with a Wilson interval for Bayesian blending into Monte Carlo risk simulations.

If this is right

  • P_reach is reduced 38.2 percent in false-positive scenarios and increased 132.4 percent in false-negative scenarios, producing a net 124.1 percent correction.
  • Post-HAVE probability estimates vary by only a factor of 1.12 across different calibration exponents, compared with 4.6 for CVSS-only baselines.
  • The same blending rule applies uniformly across four vulnerability classes and three security tiers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to continuously running agents that update estimates in real time rather than at snapshot intervals.
  • If the host-agent overhead remains low, the method may allow digital twins to incorporate live telemetry from production fleets without separate test environments.
  • The Beta-Binomial connection suggests the framework could be generalized to other probabilistic security models that currently use static scores.

Load-bearing premise

The snapshot-isolated Bernoulli trials produce an unbiased estimate of true compromise probability without the measurement process itself changing the system's attack surface.

What would settle it

Run a controlled attack campaign on one of the evaluated production binaries, compare the observed fraction of successful compromises against the post-HAVE P_reach value, and check whether the difference exceeds the reported Wilson-interval width.

Figures

Figures reproduced from arXiv: 2606.06968 by Marco Pasquini, Vincenzo Sammartino.

Figure 1
Figure 1. Figure 1: HAVE System Architecture (Hub-and-Spoke). The Con￾troller (navy) dispatches verification tasks over an mTLS channel. The Agent (amber) enforces an allow-list and a cgroups v2 CPU cap before interacting with the target binary. Each trial is preceded by a hypervisor snapshot restore (green) to guarantee I.I.D. trial conditions. Empirical telemetry (pˆ, TTC) flows back to update the SDT’s risk model. 5.1 Desi… view at source ↗
Figure 2
Figure 2. Figure 2: 8.2 Security Tiers Each program was compiled in three distinct security tiers summarized in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Five-Node Dual-Path Attack Graph GMC . n0: attacker foothold; n1: DMZ web server (Stack Overflow, High); n2: secondary server (Command Injection, none); n3: internal convergence node (OR, reach￾able via Path A or Path B); n ∗ 4 = n ∗: SCADA HMI target. Edge labels show pre-HAVE CVSS-derived weights. n1 → n3, involving a memory-corruption exploit chain) and Path B (n0 → n2 → n3, involving a logic-flaw explo… view at source ↗
read the original abstract

Security Digital Twins (SDTs) provide continuously updated virtual replicas of infrastructure for threat simulation, yet they rely on theoretical CVSS scores to assign lateral-movement probabilities -- creating the Contextual Reality Gap: risk is overestimated where unacknowledged mitigations neutralize exploits, and drastically underestimated where logic flaws bypass all memory-safety defenses. We present the Host Active Verification Engine (HAVE), an SDT extension that deploys a safety-constrained host agent to measure the empirical probability of compromise $\hat{p}$ via maximum-likelihood estimation over snapshot-isolated Bernoulli trials. A Wilson interval-width confidence weight $\alpha_w$ propagates $\hat{p}$ into Monte Carlo simulations via a Bayesian blending rule formally related to the Beta-Binomial posterior. Evaluation across four vulnerability classes, three security tiers, and two production binaries shows HAVE reduces $P_{\text{reach}}$ by 38.2% in false-positive scenarios and increases it by 132.4% in false-negative scenarios, with a net +124.1% correction; post-HAVE estimates vary by only $1.12\times$ across calibration exponents $\kappa$, versus $4.6\times$ for CVSS-only baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces the Host Active Verification Engine (HAVE) as an extension to Security Digital Twins. It deploys a safety-constrained host agent to perform snapshot-isolated Bernoulli trials, yielding a maximum-likelihood estimate ρ̂ of compromise probability. This estimate is blended with CVSS scores via a Wilson interval-weighted rule formally related to the Beta-Binomial posterior and propagated into Monte Carlo lateral-movement simulations. Evaluation across four vulnerability classes, three security tiers, and two production binaries reports that HAVE corrects P_reach by 38.2% (false-positive scenarios) and 132.4% (false-negative scenarios) for a net +124.1% adjustment, while post-HAVE P_reach values vary by only 1.12 imes across calibration exponents κ versus 4.6 imes for CVSS-only baselines.

Significance. If the measurement assumption holds, HAVE supplies a concrete mechanism for closing the contextual reality gap in SDTs by replacing purely theoretical probabilities with empirical estimates. The reported stability across κ and the explicit link to Beta-Binomial updating constitute methodological strengths that could support more reliable threat simulation in production settings.

major comments (1)
  1. [Methods / Evaluation] Methods / Evaluation sections: The claim that snapshot-isolated Bernoulli trials produce an unbiased ρ̂ that can be directly propagated into the Monte Carlo model rests on the unverified assertion that the safety-constrained host agent leaves the original attack surface unchanged. No before-versus-after comparison of exploitable states, memory layout, or new code paths is reported for the two production binaries. This is load-bearing for the validity of the Wilson-weighted Beta-Binomial blending rule and the headline correction percentages.
minor comments (1)
  1. [Abstract] Abstract: The net +124.1% correction is stated without showing how it is derived from the separate 38.2% and 132.4% figures; a one-sentence clarification would aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the importance of validating the safety constraints on the host agent. We address the major comment below.

read point-by-point responses
  1. Referee: [Methods / Evaluation] Methods / Evaluation sections: The claim that snapshot-isolated Bernoulli trials produce an unbiased ρ̂ that can be directly propagated into the Monte Carlo model rests on the unverified assertion that the safety-constrained host agent leaves the original attack surface unchanged. No before-versus-after comparison of exploitable states, memory layout, or new code paths is reported for the two production binaries. This is load-bearing for the validity of the Wilson-weighted Beta-Binomial blending rule and the headline correction percentages.

    Authors: We agree that the manuscript does not report an explicit before-versus-after comparison of exploitable states, memory layout, or code paths for the two production binaries, and that this leaves the assertion about an unchanged attack surface unverified in the current text. The safety constraints are described as limiting the agent to non-modifying verification actions with snapshot isolation to restore state, but no empirical confirmation of invariance is provided. To address this, we will revise the Methods section to add a dedicated subsection that (i) enumerates the precise constraints enforced on the agent, (ii) reports the results of a before/after static and dynamic analysis (e.g., diff of memory maps, symbol tables, and reachable exploit paths) performed on the binaries, and (iii) discusses any residual limitations. This will directly support the validity of the Wilson-weighted blending rule and the reported correction percentages. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external empirical measurement and standard Bayesian update

full rationale

The paper defines HAVE as deploying agents to obtain an independent empirical ρ̂ via MLE on Bernoulli trials, then applies a Wilson-weighted blend formally related to the Beta-Binomial posterior before feeding the result into Monte Carlo P_reach computation. The reported percentage corrections and 1.12 imes stability across κ are computed outcomes of this pipeline on the evaluated binaries, not redefinitions or renamings of the input measurements themselves. No equation reduces the final P_reach to a fitted parameter by construction, no self-citation chain bears the central claim, and the measurement assumption is stated separately from the update rule. The derivation therefore remains non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the ledger is therefore incomplete and limited to elements explicitly named in the abstract.

free parameters (1)
  • calibration exponent κ
    The abstract states that post-HAVE estimates vary across values of κ, indicating κ is a tunable parameter in the blending rule.
axioms (1)
  • domain assumption Snapshot-isolated Bernoulli trials yield an unbiased estimate of the true compromise probability that can be safely propagated via Wilson-weighted Bayesian blending.
    This premise underpins the maximum-likelihood estimation step and the subsequent Monte Carlo update.

pith-pipeline@v0.9.1-grok · 5734 in / 1405 out tokens · 25266 ms · 2026-06-27T21:52:36.437514+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references

  1. [1]

    NIST special publication 800-82 rev. 2: Guide to industrial control systems (ICS) security,

    K. Stoufferet al., “NIST special publication 800-82 rev. 2: Guide to industrial control systems (ICS) security,” National Institute of Standards and Technology, Tech. Rep., 2015

  2. [2]

    Orpheus: Enforcing cyber- physical execution semantics to defend against data-oriented at- tacks,

    L. Cheng, K. Tian, and D. D. Yao, “Orpheus: Enforcing cyber- physical execution semantics to defend against data-oriented at- tacks,” pp. 315–326, 2017

  3. [3]

    Notline: A non- intrusive automated platform to build a digital twin,

    F. Baiardi, V . Sammartino, and S. Ruggieri, “Notline: A non- intrusive automated platform to build a digital twin,” in2025 29th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), 2025, pp. 1–8

  4. [4]

    Digital twin: Enabling technologies, challenges and open research,

    A. Fulleret al., “Digital twin: Enabling technologies, challenges and open research,”IEEE Access, vol. 8, pp. 108 952–108 971, 2020

  5. [5]

    A framework for proactive cyber-resilience: Non- intrusive modeling for autonomous defense,

    V . Sammartino, “A framework for proactive cyber-resilience: Non- intrusive modeling for autonomous defense,” inDS-RT 2025, 2025

  6. [6]

    What you corrupt is not what you crash: Challenges in fuzzing embedded devices,

    M. Muenchet al., “What you corrupt is not what you crash: Challenges in fuzzing embedded devices,” inProc. Network and Distributed System Security Symp. (NDSS), 2018

  7. [7]

    Common vulnerability scoring system v3.1: Specifica- tion document,

    FIRST.org, “Common vulnerability scoring system v3.1: Specifica- tion document,” Forum of Incident Response and Security Teams (FIRST), Tech. Rep., 2019

  8. [8]

    Simulation-powered cybersecurity: Real-time risk assessment via non-intrusive security twin,

    F. Baiardi and V . Sammartino, “Simulation-powered cybersecurity: Real-time risk assessment via non-intrusive security twin,”The Journal of Supercomputing, 2026, special Issue: Simulation-Powered Innovation: Driving the Future of Digital Ecosystems

  9. [9]

    SoK: Eternal war in memory,

    L. Szekereset al., “SoK: Eternal war in memory,” inProc. IEEE Symp. Security and Privacy (S&P), 2013, pp. 48–62

  10. [10]

    Security twins e il futuro della previsione di intrusioni cyber,

    F. Baiardi, S. Ruggieri, and V . Sammartino, “Security twins e il futuro della previsione di intrusioni cyber,”ICT Security, 2025

  11. [11]

    A specification-based state replica- tion approach for digital twins,

    M. Eckhart and A. Ekelhart, “A specification-based state replica- tion approach for digital twins,” inProc. ACM Workshop Cyber- Physical Systems Security and Privacy (CPS-SPC), 2018, pp. 36–47

  12. [12]

    Integrating digital twin security simu- lations in the security operations center,

    M. Dietz and G. Pernul, “Integrating digital twin security simu- lations in the security operations center,”IEEE Access, vol. 8, pp. 163 252–163 268, 2020

  13. [13]

    From digital twins to ai agents: A synthetic data paradigm for next-generation cybersecurity,

    F. Baiardi and V . Sammartino, “From digital twins to ai agents: A synthetic data paradigm for next-generation cybersecurity,” in Artificial Intelligence in Cybersecurity: Unlocking the Power of Large Language Models. CRC Press, 2026

  14. [14]

    Quantifying resilience of cyber-physical systems to zero- day threats: A security twin-based what-if analysis framework,

    ——, “Quantifying resilience of cyber-physical systems to zero- day threats: A security twin-based what-if analysis framework,” inProceedings of the 36th European Safety and Reliability Conference (ESREL 2026). Braga, Portugal: European Safety and Reliability Association (ESRA), June 2026

  15. [15]

    A Security Twin to Defeat Intrusions in Cyber Physical Systems,

    V . Sammartino, F. Baiardi, and S. Ruggieri, “A Security Twin to Defeat Intrusions in Cyber Physical Systems,” inESREL SRA-E 2025, 2025

  16. [16]

    Anticipating Disasters through a Security Twin,

    F. Baiardi, S. Ruggieri, and V . Sammartino, “Anticipating Disasters through a Security Twin,” inSPRINGER OPTIMIZATION AND ITS APPLICATIONS - ARES 2024, 2024

  17. [17]

    Beyond heuristics: Learning to classify vulner- abilities and predict exploits,

    M. Bozorgiet al., “Beyond heuristics: Learning to classify vulner- abilities and predict exploits,” inProc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2010, pp. 105–114

  18. [18]

    Exploit prediction scoring system (EPSS),

    J. Jacobset al., “Exploit prediction scoring system (EPSS),”Digital Threats: Research and Practice, vol. 2, no. 3, pp. 1–17, 2021

  19. [19]

    On the effectiveness of address-space random- ization,

    H. Shachamet al., “On the effectiveness of address-space random- ization,” inProc. ACM Conf. Computer and Communications Security (CCS), 2004, pp. 298–307

  20. [20]

    StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks,

    C. Cowanet al., “StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks,” inProc. USENIX Security Symp., 1998, pp. 63–78

  21. [21]

    The geometry of innocent flesh on the bone: Return- into-libc without function calls (on the x86),

    H. Shacham, “The geometry of innocent flesh on the bone: Return- into-libc without function calls (on the x86),” inProc. ACM Conf. Computer and Communications Security (CCS), 2007, pp. 552–561

  22. [22]

    AI-enabled Cyberse- curity using Synthetic Data ,

    F. Baiardi, S. Ruggieri, and V . Sammartino, “ AI-enabled Cyberse- curity using Synthetic Data ,” in2025 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2025, pp. 140–145

  23. [23]

    Nmap network scanning: The official Nmap project guide to network discovery and security scanning,

    G. Lyon, “Nmap network scanning: The official Nmap project guide to network discovery and security scanning,”Insecure.com LLC, 2009

  24. [24]

    OpenVAS: Open vulnerability assessment system,

    Greenbone Networks, “OpenVAS: Open vulnerability assessment system,” 2009

  25. [25]

    Evaluating security scanners for GNU/Linux systems: Configuration compliance and vulnerability management,

    M. Moberg, J. Hallberg, and N. Hallberg, “Evaluating security scanners for GNU/Linux systems: Configuration compliance and vulnerability management,” inProc. Int’l Conf. Availability, Reliabil- ity and Security (ARES), 2014, pp. 506–513

  26. [26]

    AEG: Automatic exploit generation,

    T. Avgerinoset al., “AEG: Automatic exploit generation,” inProc. Network and Distributed System Security Symp. (NDSS), 2011

  27. [27]

    Automatic exploit generation,

    ——, “Automatic exploit generation,”Communications of the ACM, vol. 57, no. 2, pp. 74–84, 2014

  28. [28]

    SoK: (state of) the art of war: Offensive techniques in binary analysis,

    Y. Shoshitaishviliet al., “SoK: (state of) the art of war: Offensive techniques in binary analysis,” inProc. IEEE Symp. Security and Privacy (S&P), 2016, pp. 138–157

  29. [29]

    Automated generation and analysis of attack graphs,

    O. Sheyneret al., “Automated generation and analysis of attack graphs,” inProc. IEEE Symp. Security and Privacy (S&P), 2002, pp. 273–284

  30. [30]

    A scalable approach to attack graph generation,

    X. Ou, W. F. Boyer, and M. A. McQueen, “A scalable approach to attack graph generation,” inProc. ACM Conf. Computer and Communications Security (CCS), 2006, pp. 336–345

  31. [31]

    Measuring network security using Bayesian network-based attack graphs,

    M. Frigault and L. Wang, “Measuring network security using Bayesian network-based attack graphs,” inProc. IEEE Int’l Com- puter Software and Applications Conf. Workshop (COMPSACW), 2008, pp. 698–703

  32. [32]

    Dynamic security risk management using Bayesian attack graphs,

    N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk management using Bayesian attack graphs,”IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 1, pp. 61–74, 2012

  33. [33]

    Quantitative cyber risk reduction estima- tion methodology for a small SCADA control system,

    M. A. McQueenet al., “Quantitative cyber risk reduction estima- tion methodology for a small SCADA control system,” 2006

  34. [34]

    Aggregating vulnerability metrics in enter- prise networks using attack graphs,

    J. Homeret al., “Aggregating vulnerability metrics in enter- prise networks using attack graphs,”Journal of Computer Security, vol. 21, no. 4, pp. 561–597, 2013

  35. [35]

    Inception: System-wide security testing of real-world embedded systems software,

    N. Corteggiani, G. Camurati, and A. Francillon, “Inception: System-wide security testing of real-world embedded systems software,” inProc. USENIX Security Symp., 2018, pp. 309–326

  36. [36]

    Adding generic process containers to the Linux kernel,

    P . B. Menage, “Adding generic process containers to the Linux kernel,” vol. 2, pp. 45–57, 2007

  37. [37]

    MITRE ATT&CK: Design and philosophy,

    B. E. Stromet al., “MITRE ATT&CK: Design and philosophy,” 2020, technical Report

  38. [38]

    NIST special publication 800-207: Zero trust ar- chitecture,

    S. Roseet al., “NIST special publication 800-207: Zero trust ar- chitecture,” National Institute of Standards and Technology, Tech. Rep., 2020

  39. [39]

    checksec.sh – a shell script to test for common buffer overflow mitigations,

    T. Kleinet al., “checksec.sh – a shell script to test for common buffer overflow mitigations,” https://github.com/slimm609/ checksec.sh, 2009

  40. [40]

    pwntools – CTF framework and exploit development library,

    F. Blichmann, M. Mazureket al., “pwntools – CTF framework and exploit development library,” https://github.com/Gallopsled/ pwntools, 2015

  41. [41]

    A quantitative framework for the validation of twin-based cyber defense,

    F. Baiardi and V . Sammartino, “A quantitative framework for the validation of twin-based cyber defense,” in37th European Modeling & Simulation Symposium (EMSS 2025), held within the 22nd Interna- tional Multidisciplinary Modeling & Simulation Multiconference (I3M 2025), 2025

  42. [42]

    Exploiting format string vulnerabilities,

    scut / team teso, “Exploiting format string vulnerabilities,” Phrack Magazine, version 1.2, 2001

  43. [43]

    Preventing use-after-free with dangling pointers nul- lification,

    B. Leeet al., “Preventing use-after-free with dangling pointers nul- lification,” inProc. Network and Distributed System Security Symp. (NDSS), 2015

  44. [44]

    Directed greybox fuzzing,

    M. B ¨ohmeet al., “Directed greybox fuzzing,” inProc. ACM Conf. Computer and Communications Security (CCS), 2017, pp. 2329–2344