pith. sign in

arxiv: 2606.29596 · v1 · pith:JZJUNJMHnew · submitted 2026-06-28 · 💻 cs.SI · cs.LG

Boundary Degree as a Node-level Feature for Epidemic Scenario Identification in Agent-based Cascade Simulations

Pith reviewed 2026-06-30 01:35 UTC · model grok-4.3

classification 💻 cs.SI cs.LG
keywords boundary degreeepidemic scenario identificationagent-based simulationscontact networkscascade featuresnode-level featurescontact tracing
0
0 comments X

The pith

Boundary degree as a per-node feature improves epidemic scenario identification accuracy by 19%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces boundary degree as a node-level feature for identifying the underlying epidemic scenario from observed disease cascades in simulations. Boundary degree counts how many contacts of an infected node remain uninfected in the contact network. Systematic tests on realistic Tennessee and Virginia contact networks show that adding this feature alone raises identification accuracy by 19%. Edge features also boost performance consistently, and the two types of information complement each other. The authors prove that certain scenarios cannot be distinguished without boundary or edge details, and recommend that contact tracing track non-transmission contacts as well.

Core claim

Boundary degree, the number of uninfected contacts of an infected node, serves as an effective per-node feature that raises scenario identification accuracy by 19% in agent-based cascade simulations on Tennessee and Virginia networks. Edge features provide complementary gains with theoretical support, and some epidemic scenarios are provably indistinguishable without this information. Per-node boundary statistics outperform earlier aggregate versions in feature importance.

What carries the argument

Boundary degree: the count of an infected node's contacts in the underlying contact network that were not infected. It acts as a per-node cascade feature to distinguish epidemic scenarios.

If this is right

  • Boundary degree alone improves scenario identification accuracy by 19%.
  • Edge features consistently improve accuracy across all settings.
  • Certain epidemic scenarios are indistinguishable without boundary or edge information.
  • Per-node representation of boundary information reveals its importance more clearly than aggregate statistics.
  • Contact tracing applications should track contacts with non-infected individuals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Boundary degree could be tested on other types of networks or real outbreak data to see if the accuracy gain holds.
  • The feature might help in designing better surveillance systems that collect non-infection contact data.
  • Combining boundary degree with machine learning models beyond the ones tested here could yield further gains.

Load-bearing premise

The agent-based cascade simulations on the Tennessee and Virginia contact networks faithfully represent the distinguishable epidemic scenarios used in the identification task.

What would settle it

Running the identification task on actual epidemic data from a real outbreak where the true scenario is known, and checking whether boundary degree still provides a similar accuracy boost.

Figures

Figures reproduced from arXiv: 2606.29596 by Abhijin Adiga, Amro Alabsi Aljundi, Anil Kumar Vullikanti, Galen Harrison, Jiangzhuo Chen, Madhav V. Marathe.

Figure 1
Figure 1. Figure 1: A contact network (left), a cascade generated over it (center), and the same cascade under partial [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Feature importance analysis on TN at T ∈ {30,50}. (a) GNN accuracy with each aggregated node feature added individually. (b) complementarity of boundary degree and edge features. on average, by 42%. We observe a similar effect on alternative GNN architectures including one with stochastic edge sampling (You et al. 2020), attention mechanisms (Brody et al. 2021), and neighborhood sampling (Hamilton et al. 2… view at source ↗
Figure 3
Figure 3. Figure 3: SBM experiments. (a) Effect of edge features on [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparing the classification accuracy on TN/VA over [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Transfer learning: accuracy on VA for a model trained on VA (red) and a model trained on TN [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy of the GNN model compared on TN at different times and coverage values using a [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy on all VA datasets when the model is trained on the evaluation dataset versus when it [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Results from transferring a model trained on TN to VA by fine-tuning its classification head [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparing the accuracy of the GNN model on TN with and without using edge features. The [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of GNN accuracy when incorporating boundary edge counts and when the graph [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Plotting the accuracy of the GNN model versus the mean number of edges of the cascade dataset [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

Characterizing the scenario underlying an epidemic from its disease cascade is an important task in simulation analytics. We propose boundary degree, the count of an infected node's contacts in the underlying contact network that were not infected, as a per-node cascade feature for this task. Through systematic ablation on realistic social contact networks of Tennessee and Virginia, we show that boundary degree alone improves scenario identification accuracy by 19%. Edge features, whose importance was observed empirically by prior work, consistently improve accuracy across all settings; we provide theoretical grounding for this observation. These effects are complementary. We prove that certain epidemic scenarios are indistinguishable without boundary or edge information. Prior feature engineering approaches included aggregate boundary statistics, but these were not among the top-ranked feature groups; the per-node representation we propose reveals their importance clearly. Our results suggest that contact tracing applications should track contacts with non-infected individuals, not only transmissions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes boundary degree—the count of an infected node's uninfected contacts in the underlying contact network—as a per-node feature for identifying the epidemic scenario from observed disease cascades generated by agent-based simulations. Using systematic ablation on realistic contact networks of Tennessee and Virginia, it reports that boundary degree alone yields a 19% improvement in scenario identification accuracy. The work also supplies theoretical grounding for the utility of edge features observed in prior studies, proves that certain epidemic scenarios are indistinguishable without boundary or edge information, and notes that aggregate boundary statistics were not top-ranked in earlier feature sets while the per-node version reveals their value. Implications for contact-tracing applications are discussed.

Significance. If the reported accuracy gain is substantiated with full experimental details, boundary degree could become a standard node-level feature in cascade analytics and contact-tracing pipelines, shifting attention from transmissions alone to non-infected contacts. The proof of indistinguishability constitutes a clear theoretical contribution that directly supports the empirical ablation results. The observation that per-node rather than aggregate boundary statistics matter is a useful clarification of prior work.

major comments (3)
  1. [Abstract and Results] Abstract and the ablation study in the results: the central claim that boundary degree alone improves scenario identification accuracy by 19% is stated without the baseline accuracy value, the classifier or identification algorithm employed, the number of scenarios or simulation runs, the precise definition of accuracy, or any statistical test. These omissions make the magnitude and reliability of the reported gain impossible to evaluate.
  2. [Methods / Simulation Setup] Simulation setup and network sections: the distinguishability of the epidemic scenarios used for the identification task is load-bearing for the utility claim, yet the manuscript supplies no parameter ranges for the epidemic model (transmission probability, recovery rate, etc.), no sensitivity analysis, and no validation of the generated cascades against observed outbreaks. The proof of indistinguishability is referenced but does not address whether the chosen model parameters artificially enhance separability.
  3. [Theoretical Analysis] Theoretical section: while a proof that certain scenarios are indistinguishable without boundary or edge information is asserted, the manuscript does not state the precise theorem, the assumptions on the cascade process, or the conditions under which the result holds, preventing assessment of its scope relative to the empirical networks.
minor comments (2)
  1. [Methods] Notation for boundary degree should be introduced with an explicit equation or definition early in the methods rather than only in prose.
  2. [Abstract] The abstract's phrasing that 'edge features... consistently improve accuracy across all settings' would benefit from a supporting table or figure reference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on clarity and completeness. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and the ablation study in the results: the central claim that boundary degree alone improves scenario identification accuracy by 19% is stated without the baseline accuracy value, the classifier or identification algorithm employed, the number of scenarios or simulation runs, the precise definition of accuracy, or any statistical test. These omissions make the magnitude and reliability of the reported gain impossible to evaluate.

    Authors: We agree that the abstract and results section should explicitly include these details to allow proper evaluation. The manuscript contains the classifier, scenario count, run count, accuracy definition, and statistical test in the methods and supplementary material, but we will revise both the abstract and the ablation study presentation in results to state them directly alongside the 19% figure. revision: yes

  2. Referee: [Methods / Simulation Setup] Simulation setup and network sections: the distinguishability of the epidemic scenarios used for the identification task is load-bearing for the utility claim, yet the manuscript supplies no parameter ranges for the epidemic model (transmission probability, recovery rate, etc.), no sensitivity analysis, and no validation of the generated cascades against observed outbreaks. The proof of indistinguishability is referenced but does not address whether the chosen model parameters artificially enhance separability.

    Authors: The simulation setup section does specify SIR parameters (transmission probability range and recovery rate), but we acknowledge the absence of sensitivity analysis and will add it to demonstrate that the accuracy gain is robust across the parameter space. The cascades are generated synthetically on realistic networks by design; we will clarify that direct validation against specific real outbreaks is outside the paper's scope. The indistinguishability proof is structural and holds independently of specific parameter values. revision: partial

  3. Referee: [Theoretical Analysis] Theoretical section: while a proof that certain scenarios are indistinguishable without boundary or edge information is asserted, the manuscript does not state the precise theorem, the assumptions on the cascade process, or the conditions under which the result holds, preventing assessment of its scope relative to the empirical networks.

    Authors: We will revise the theoretical section to state the precise theorem, list the assumptions (SIR process on known undirected network, observed infection times, no reinfection), and specify the conditions (identical infection trees but differing boundary structure). This will clarify its general scope and direct applicability to the Tennessee and Virginia networks used empirically. revision: yes

Circularity Check

0 steps flagged

Minor self-citation on edge features; main empirical result on boundary degree is independent

full rationale

The paper defines boundary degree directly from the contact network and cascade, then reports an ablation study on Tennessee/Virginia networks showing a 19% accuracy gain. No equations, fitted parameters, or predictions reduce by construction to inputs. The claim of theoretical grounding for edge-feature importance references prior work (likely self-citation), but this is not load-bearing for the central boundary-degree result. The indistinguishability proof is presented within the paper itself. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen contact networks and agent-based model produce cascades whose scenario labels are recoverable by the proposed feature; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The Tennessee and Virginia contact networks are realistic and representative for testing epidemic scenario distinguishability.
    The ablation experiments are performed exclusively on these networks.

pith-pipeline@v0.9.1-grok · 5709 in / 1141 out tokens · 37682 ms · 2026-06-30T01:35:28.764917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Synthetic population for USA_VIRGINIA

    “Synthetic population for USA_VIRGINIA”. Technical report, Biocomplexity Institute, University of Virginia. https://zenodo.org/records/6505964. Aljundi, A. A., G. Harrison, J. Chen, M. V . Marathe, H. Mortveit, A. Vullikantiet al

  2. [2]

    Deep learning for supervised classification of spatial epidemics

    “Deep learning for supervised classification of spatial epidemics”.Spatial Spatio-temporal Epidemiol.https://doi.org/10.1016/j.sste.2018.08.002. Bedson, J., L. A. Skrip, D. Pedi, S. Abramowitz, S. Carter, M. F. Jalloh,et al

  3. [3]

    How Attentive are Graph Attention Networks?

    “How attentive are graph attention networks?”.arXiv preprint arXiv:2105.14491. Chen, J., S. Hoops, H. S. Mortveit, B. L. Lewis, D. Machi, P. Bhattacharya,et al

  4. [4]

    Epihiper—A high performance computational modeling framework to support epidemic science

    “Epihiper—A high performance computational modeling framework to support epidemic science”.PNAS nexushttps://doi.org/10.1093/pnasnexus/pgae557. Alabsi Aljundi, Harrison, Chen, Adiga, Vullikanti, and Marathe Cheng, Justin and Adamic, Lada A. and Dow, P. Alex and Kleinberg, Jon and Leskovec, Jure

  5. [5]

    Can Cascades be Predicted?

    “Can Cascades be Predicted?” https://doi.org/10.1145/2566486.2567997. Chopra, A., A. Rodríguez, J. Subramanian, A. Quera-Bofarull, B. Krishnamurthy, B. A. Prakashet al

  6. [6]

    Differentiable Agent- based Epidemiology

    “Differentiable Agent- based Epidemiology”. InProc. 2023 Int. Conf. Autonomous Agents Multiagent Syst.https://doi.org/10.5555/3545946.3598851. Dehmamy, N., A.-L. Barabási, and R. Yu

  7. [7]

    Neural Message Passing for Quantum Chemistry

    “Neural Message Passing for Quantum Chemistry”. arXiv:1704.01212. Gleeson, James P

  8. [8]

    Inductive representation learning on large graphs

    “Inductive representation learning on large graphs”.Advances in neural information processing systems30. Hamilton, W. L. 2020.Graph representation learning. Morgan & Claypool. Harrison, G., A. Alabsi Aljundi, J. Chen, S. Ravi, A. K. Vullikanti, M. V . Maratheet al

  9. [9]

    Identifying Complicated Contagion Scenarios from Cascade Data

    “Identifying Complicated Contagion Scenarios from Cascade Data”. InProc. 29th ACM Conf. Knowl. Discovery Data Mininghttps://doi.org/10. 1145/3580305.3599841. Holland, P. W., K. B. Laskey, and S. Leinhardt

  10. [10]

    Cascade2vec: Learning Dynamic Cascade Representation by Recurrent Graph Neural Networks

    “Cascade2vec: Learning Dynamic Cascade Representation by Recurrent Graph Neural Networks”.IEEE Accesshttps://doi.org/10.1109/ACCESS.2019.2942853. Ioffe, S., and C. Szegedy

  11. [11]

    Theory of Graph Neural Networks: Representation and Learning

    “Theory of Graph Neural Networks: Representation and Learning”.arXiv:2204.07697https://doi.org/10. 48550/arXiv.2204.07697. Kempe, D., J. Kleinberg, and É. Tardos

  12. [12]

    Maximizing the spread of influence through a social network

    “Maximizing the spread of influence through a social network”. InProc. 9th ACM Conf. Knowl. Discovery Data Mininghttps://doi.org/10.1145/956750.956769. Kermack, W. O., and A. G. McKendrick

  13. [13]

    A contribution to the mathematical theory of epidemics

    “A contribution to the mathematical theory of epidemics”.Proc. Roy. Soc. London Ser . Ahttps://doi.org/10.1098/rspa.1927.0118. Leskovec, Jure

  14. [14]

    Effective degree network disease models

    “Effective degree network disease models” https://doi.org/10.1007/s00285-010-0331-2. Marathe, M., and A. K. S. Vullikanti

  15. [15]

    EoN (Epidemics on Networks): a fast, flexible Python package for simulation, analytic approximation, and analysis of epidemics on networks

    “EoN (Epidemics on Networks): a fast, flexible Python package for simulation, analytic approximation, and analysis of epidemics on networks”.J. Open Source Softw.https://doi.org/10.21105/joss.01731. Moon, S. A., J. Chen, B. Espinoza, B. Lewis, M. Marathe, J. Outten,et al

  16. [16]

    A Dirichlet process model for classifying and forecasting epidemic curves

    “A Dirichlet process model for classifying and forecasting epidemic curves”.BMC Infectious Diseaseshttps://doi.org/10.1186/1471-2334-14-12. Paul, S. G., A. Saha, M. Z. Hasan, S. R. H. Noori, and A. Moustafa

  17. [17]

    Supervised learning and prediction of spatial epidemics

    “Supervised learning and prediction of spatial epidemics”.Spatial Spatio-temporal Epidemiol.https://doi.org/10.1016/j.sste.2014.08.003. Prakash, B. Aditya and Vreeken, Jilles and Faloutsos, Christos

  18. [18]

    Spotting Culprits in Epidemics: How Many and Which Ones?

    “Spotting Culprits in Epidemics: How Many and Which Ones?” https://doi.org/10.1109/ICDM.2012.136. Alabsi Aljundi, Harrison, Chen, Adiga, Vullikanti, and Marathe Reiker, T., M. Golumbeanu, A. Shattock, L. Burgert, T. A. Smith, S. Filippi,et al

  19. [19]

    The perceptron: A probabilistic model for information storage and organization in the brain

    “The perceptron: A probabilistic model for information storage and organization in the brain”.Psychol. Rev.https://doi.org/10.1037/h0042519. Verelst, F., L. Willem, and P. Beutels

  20. [20]

    Behavioural change models for infectious disease transmission: a systematic review (2010–2015)

    “Behavioural change models for infectious disease transmission: a systematic review (2010–2015)”.J. Roy. Soc. Interface. Wang, L., J. Chen, and M. Marathe

  21. [21]

    CCGL: Contrastive Cascade Graph Learning

    “CCGL: Contrastive Cascade Graph Learning”.IEEE Trans. Knowl. Data Eng.https://doi.org/10.1109/TKDE.2022.3151829. Ye, Y ., A. Pandey, C. Bawden, D. M. Sumsuzzman, R. Rajput, A. Shoukat,et al