pith. machine review for the scientific record. sign in

arxiv: 2604.14685 · v1 · submitted 2026-04-16 · 💻 cs.CR

Recognition: unknown

Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection

Binyan Xu, Di Tang, Fan Yang, Kehuan Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:17 UTC · model grok-4.3

classification 💻 cs.CR
keywords provenance graphsintrusion detectionmulti-view fusionanomaly detectiongraph-based securitycausality analysisvoting aggregationcybersecurity
0
0 comments X

The pith

A multi-view fusion framework improves provenance-based intrusion detection by combining attribute, structure, and causality anomaly signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the limitations of existing provenance-based intrusion detection methods that rely either on node-centric or edge-centric analysis. Node-centric approaches can misclassify normal changes as attacks, while edge-centric ones may miss context from entities. PROVFUSION integrates signals from three views—attribute, structure, and causality—using lightweight fusion and voting to make final decisions. This provides a more balanced assessment that captures both entity deviations and interaction anomalies. Experiments show it outperforms single-view baselines on nine datasets with higher accuracy and fewer false positives.

Core claim

PROVFUSION integrates anomaly signals from attribute, structure, and causality views in provenance graphs through lightweight fusion schemes and a voting-based process to determine final anomaly decisions, enabling capture of both entity-level deviations and interaction-level anomalies in a consistent pipeline.

What carries the argument

The multi-view fusion framework that combines heterogeneous anomaly signals from three views via lightweight fusion and voting-based aggregation.

If this is right

  • PROVFUSION achieves higher detection accuracy than node- or edge-centric baselines on nine benchmark datasets.
  • It maintains lower false-positive rates across various scenarios.
  • The approach provides a consistent and context-aware assessment of system behavior.
  • It captures both entity level deviations and interaction-level anomalies within one pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such fusion methods could be adapted to other domains involving multi-perspective graph analysis, like network traffic monitoring.
  • If the views are not fully independent, fusion might introduce correlations that affect performance on sophisticated attacks.
  • Lightweight fusion suggests potential for deployment in resource-constrained environments.

Load-bearing premise

The anomaly signals from the attribute, structure, and causality views are complementary enough that their fusion improves detection without introducing new errors or missing coordinated attacks.

What would settle it

A test on new datasets or attack scenarios where the multi-view method shows no improvement or worse performance than the best single-view detector would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.14685 by Binyan Xu, Di Tang, Fan Yang, Kehuan Zhang.

Figure 1
Figure 1. Figure 1: Visualization of node- and edge-centric detec￾tion. No malicious nodes show high scores in both methods. suggesting that node-type prediction alone tends to evaluate local patterns rather than interaction abnormality, providing limited evidence of compromise. Summary: Overall, node-centric analysis primarily focuses on an entity’s local patterns or structural consistency. This focus allows it to capture st… view at source ↗
Figure 2
Figure 2. Figure 2: The overall architecture of PROVFUSION. It takes raw system logs, prepares a provenance graph, characterizes node anomaly from three complementary views (Attribute, Structural, Causal), and finally fuses these scores to detect anomalies. 4.1. Provenance Graph Preparation We transform raw system logs into an attributed prove￾nance graph that supports subsequent analysis. 4.1.1. Graph Construction. We parse … view at source ↗
Figure 3
Figure 3. Figure 3: Impact of the voting threshold (Tv) on performance. A threshold of Tv = 4 provides the optimal balance, maximizing the (b) F1 and (c) MCC scores while dramatically reducing the (a) False Positives seen at lower thresholds. Method Dataset MCC ↑ ADP ↑ Detect E3 E5 Avg E3 E5 Avg Individual Model Families (Best over 4 normalizations) AE CADETS .571 .100 .336 1.00 .917 .958 5/5 CLRSCP .117 .291 .204 1.00 .553 .… view at source ↗
Figure 5
Figure 5. Figure 5: Scalability Analysis. While this step, which compares test nodes against all training nodes, is often a computational bottleneck, we address this by integrating the Faiss library [44], [45] to accelerate the nearest-neighbor search. This optimization is highly effective, leading to a consistently strong inference performance. As a result, PROVFUSION is faster than both baselines on many datasets. For insta… view at source ↗
Figure 6
Figure 6. Figure 6: Hyperparameter sensitivity on THEIA-E3 dataset. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The attack subgraph generated by PROVFUSION for the CLEARSCOPE-E3 dataset, highlighting the key malicious events and their causal relationships. In this graph, circles represent FILE, squares represent PROCESS, and diamonds represent NETWORK nodes. The entity marked as red denotes true positives detected by PROVFUSION. Beyond aggregate metrics, we examine whether PROV￾FUSION produces actionable alerts for … view at source ↗
read the original abstract

Provenance-based intrusion detection has emerged as a promising approach for analyzing complex attack behaviors through system-level provenance graphs. However, existing defense methods face an inherent granularity limitation. Node-centric detectors, which evaluate anomalies using entities' attributes and local structural patterns, may misclassify benign behavioral changes or configuration modifications as suspicious. In contrast, edge-centric detectors, which focus more on interactions, may lack sufficient contextual awareness of the involved entities, leading to missed detections when compromised entities perform seemingly ordinary operations. These analytical biases highlight a persistent gap between node-centric and edge-centric analyses. To mitigate this gap, we present PROVFUSION, a multi-view detection framework that integrates anomaly signals from three distinct views (i.e., attribute, structure, and causality). The framework fuses heterogeneous anomaly signals through lightweight fusion schemes and determines the final anomaly decisions through a voting-based integration process, providing a more consistent and context-aware assessment of system behavior. This design enables PROVFUSION to capture both entity level deviations and interaction-level anomalies within a consistent analytic pipeline. Experiments on nine widely used benchmark datasets demonstrate that PROVFUSION achieves higher detection accuracy and lower false-positive rates than single node- and edge-centric baselines, maintaining stable performance across scenarios. Overall, the results suggest that our multi-view anomaly fusion together with voting-based decision aggregation offers a practical and effective direction for advancing provenance-based intrusion detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces PROVFUSION, a multi-view fusion framework for provenance-based intrusion detection that integrates anomaly signals from attribute, structure, and causality views of system provenance graphs. It employs lightweight fusion schemes followed by voting-based decision aggregation to address granularity limitations of existing node-centric and edge-centric detectors. Experiments on nine widely used benchmark datasets are reported to show higher detection accuracy and lower false-positive rates than single-view baselines, with stable performance across scenarios.

Significance. If the claimed performance gains are attributable to complementary signals across the three views rather than implementation artifacts, the work offers a practical direction for improving robustness in provenance-based IDS by mitigating analytical biases of single-granularity approaches. The multi-benchmark evaluation is a strength that supports generalizability claims. However, the absence of methodological specifics limits assessment of whether the framework truly advances the state of the art beyond design intuition.

major comments (2)
  1. [Abstract] Abstract: The central performance claim (higher accuracy and lower FPR than node- and edge-centric baselines on nine benchmarks) is presented without any description of the fusion operator, voting mechanics, statistical tests for significance, or class-imbalance handling. This omission is load-bearing because the abstract provides no derivation or controls to attribute gains specifically to multi-view fusion rather than other factors.
  2. [Abstract] Abstract (and presumed methodology section): No equations or analysis are given for anomaly signal extraction from the attribute/structure/causality views or for verifying their independence/complementarity. In DAG provenance graphs, structure and causality views often share information; without an ablation or error-correlation study, the assumption that lightweight fusion plus voting yields net improvement remains unverified and directly underpins the superiority claim over baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below. Where the comments identify opportunities for clarification, we have made or will make revisions to strengthen the presentation of our multi-view fusion approach.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claim (higher accuracy and lower FPR than node- and edge-centric baselines on nine benchmarks) is presented without any description of the fusion operator, voting mechanics, statistical tests for significance, or class-imbalance handling. This omission is load-bearing because the abstract provides no derivation or controls to attribute gains specifically to multi-view fusion rather than other factors.

    Authors: We agree that the abstract, due to length constraints, does not detail the fusion operator or voting mechanics. The full manuscript describes the lightweight fusion as a combination of normalized anomaly scores from the three views (via averaging or min-max schemes) followed by a majority-voting aggregation for the final decision. Evaluation uses standard metrics including accuracy, FPR, and F1-score across the nine benchmarks, with repeated runs to assess stability; class imbalance is mitigated by the choice of these metrics rather than accuracy alone. We will revise the abstract to include a concise description of the fusion and voting process and reference the evaluation protocol. This revision directly addresses the attribution concern while preserving the brevity of the abstract. revision: yes

  2. Referee: [Abstract] Abstract (and presumed methodology section): No equations or analysis are given for anomaly signal extraction from the attribute/structure/causality views or for verifying their independence/complementarity. In DAG provenance graphs, structure and causality views often share information; without an ablation or error-correlation study, the assumption that lightweight fusion plus voting yields net improvement remains unverified and directly underpins the superiority claim over baselines.

    Authors: The methodology section details the anomaly signal extraction: the attribute view computes deviations using statistical and feature-based scores on entity properties; the structure view identifies anomalies via local neighborhood patterns and graph metrics; and the causality view examines deviations in causal dependency chains within the DAG. While the abstract omits equations for brevity, they appear in the main text. We acknowledge that structure and causality views can overlap in DAGs, yet the empirical gains on nine diverse benchmarks indicate that the three views provide complementary signals when fused. We agree that an explicit ablation study and error-correlation analysis would strengthen verification of complementarity and will add these to the revised manuscript, including per-view performance breakdowns and correlation matrices of detection errors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of design choice on external benchmarks

full rationale

The paper introduces PROVFUSION as a multi-view framework fusing attribute, structure, and causality anomaly signals via lightweight schemes and voting, then reports higher accuracy and lower FPR than node/edge baselines on nine independent benchmark datasets. No equations, parameter fits, or derivations are presented that reduce the claimed gains to quantities defined by the same inputs. The framework is an explicit design choice whose performance is measured against external data; no self-citation chains, self-definitional steps, or fitted-input predictions appear in the described pipeline. The result is therefore self-contained against the benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, background axioms, or newly postulated entities; the framework is presented as an engineering combination of existing provenance analysis perspectives.

pith-pipeline@v0.9.0 · 5550 in / 1117 out tokens · 26928 ms · 2026-05-10T11:17:29.930549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Nodoze: Combatting threat alert fatigue with automated provenance triage,

    W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage,” in26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24- 27, 2019. The Internet Society, 2019

  2. [2]

    Tactical provenance analysis for endpoint detection and response systems,

    W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in2020 IEEE Sympo- sium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. IEEE, 2020, pp. 1172–1189

  3. [3]

    Towards a timely causality analysis for enterprise security,

    Y . Liu, M. Zhang, D. Li, K. Jee, Z. Li, Z. Wu, J. Rhee, and P. Mittal, “Towards a timely causality analysis for enterprise security,” in25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society, 2018

  4. [4]

    Sok: History is a vast early warning system: Auditing the provenance of system intrusions,

    M. Inam, Y . Chen, A. Goyal, J. Liu, J. Mink, N. Michael, S. Gaur, A. Bates, and W. Hassan, “Sok: History is a vast early warning system: Auditing the provenance of system intrusions,” in2023 IEEE Symposium on Security and Privacy (SP), 2023

  5. [5]

    HERCULE: attack story reconstruction via community discovery on correlated log graph,

    K. Pei, Z. Gu, B. Saltaformaggio, S. Ma, F. Wang, Z. Zhang, L. Si, X. Zhang, and D. Xu, “HERCULE: attack story reconstruction via community discovery on correlated log graph,” inProceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, Los Angeles, CA, USA, December 5-9, 2016, S. Schwab, W. K. Robertson, and D. Balzarotti, E...

  6. [6]

    A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,

    A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,”IEEE Commun. Surv. Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019

  7. [7]

    HOLMES: real-time APT detection through corre- lation of suspicious information flows,

    S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V . N. Venkatakrishnan, “HOLMES: real-time APT detection through corre- lation of suspicious information flows,” in2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23,

  8. [8]

    1137–1152

    IEEE, 2019, pp. 1137–1152

  9. [9]

    Towards scalable cluster auditing through grammatical inference over provenance graphs,

    W. U. Hassan, M. Lemay, N. Aguse, A. Bates, and T. Moyer, “Towards scalable cluster auditing through grammatical inference over provenance graphs,” in25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society, 2018

  10. [10]

    SHADEW ATCHER: recommendation-guided cyber threat analysis using system audit records,

    J. Zeng, X. Wang, J. Liu, Y . Chen, Z. Liang, T. Chua, and Z. L. Chua, “SHADEW ATCHER: recommendation-guided cyber threat analysis using system audit records,” in43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022. IEEE, 2022, pp. 489–506

  11. [11]

    You are what you do: Hunting stealthy malware via data provenance analysis,

    Q. Wang, W. U. Hassan, D. Li, K. Jee, X. Yu, K. Zou, J. Rhee, Z. Chen, W. Cheng, C. A. Gunter, and H. Chen, “You are what you do: Hunting stealthy malware via data provenance analysis,” in27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society, 2020

  12. [12]

    Unicorn: Runtime provenance-based detector for advanced persistent threats,

    X. Han, T. F. J. Pasquier, A. Bates, J. Mickens, and M. I. Seltzer, “Unicorn: Runtime provenance-based detector for advanced persistent threats,” in27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23- 26, 2020. The Internet Society, 2020

  13. [13]

    Edgetorrent: Real-time temporal graph representations for intrusion detection,

    I. J. King, X. Shu, J. Jang, K. Eykholt, T. Lee, and H. H. Huang, “Edgetorrent: Real-time temporal graph representations for intrusion detection,” inProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, ser. RAID ’23. As- sociation for Computing Machinery, 2023, p. 77–91

  14. [14]

    How Powerful are Graph Neural Networks?

    K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?”arXiv preprint arXiv:1810.00826, 2018

  15. [15]

    Provg-searcher: A graph rep- resentation learning approach for efficient provenance graph search,

    E. Altinisik, F. Deniz, and H. T. Sencar, “Provg-searcher: A graph rep- resentation learning approach for efficient provenance graph search,” inProceedings of the 2023 ACM SIGSAC conference on computer and communications security, 2023, pp. 2247–2261

  16. [16]

    PROGRAPHER: an anomaly detection system based on provenance graph embedding,

    F. Yang, J. Xu, C. Xiong, Z. Li, and K. Zhang, “PROGRAPHER: an anomaly detection system based on provenance graph embedding,” in 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, J. A. Calandrino and C. Troncoso, Eds. USENIX Association, 2023, pp. 4355–4372

  17. [17]

    Provenance-based intrusion detection: opportunities and challenges,

    X. Han, T. Pasquier, and M. Seltzer, “Provenance-based intrusion detection: opportunities and challenges,” inProceedings of the 10th USENIX Conference on Theory and Practice of Provenance, ser. TaPP’18. USA: USENIX Association, 2018, p. 3

  18. [18]

    Magic: Detecting advanced persistent threats via masked graph representation learning,

    Z. Jia, Y . Xiong, Y . Nan, Y . Zhang, J. Zhao, and M. Wen, “Magic: Detecting advanced persistent threats via masked graph representation learning,” 2023

  19. [19]

    Flash: A comprehensive approach to intrusion detection via provenance graph representation learning,

    M. Rehman, H. Ahmadi, and W. Hassan, “Flash: A comprehensive approach to intrusion detection via provenance graph representation learning,” in2024 IEEE Symposium on Security and Privacy (SP)

  20. [20]

    Graph based anomaly detection and description: a survey,

    L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,”Data mining and knowledge discovery, vol. 29, no. 3, pp. 626–688, 2015

  21. [21]

    A comprehensive survey on graph anomaly detection with deep learning,

    X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu, “A comprehensive survey on graph anomaly detection with deep learning,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, p. 12012–12038, Dec. 2023

  22. [22]

    R-caid: Embedding root cause analysis within provenance-based intrusion detection,

    A. Goyal, G. Wang, and A. Bates, “R-caid: Embedding root cause analysis within provenance-based intrusion detection,” in2024 IEEE Symposium on Security and Privacy (SP), 2024

  23. [23]

    SIGL: Securing software installations through deep graph learning,

    X. Han, X. Yu, T. Pasquier, D. Li, J. Rhee, J. Mickens, M. Seltzer, and H. Chen, “SIGL: Securing software installations through deep graph learning,” in30th USENIX Security Symposium (USENIX Security 21), 2021

  24. [24]

    Nodlink: An online system for fine-grained apt attack detection and investigation,

    S. Li, F. Dong, X. Xiao, H. Wang, F. Shao, J. Chen, Y . Guo, X. Chen, and D. Li, “Nodlink: An online system for fine-grained apt attack detection and investigation,” inProceedings 2024 Network and Distributed System Security Symposium. Internet Society, 2024

  25. [25]

    Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems,

    T. Bilot, B. Jiang, Z. Li, N. El Madhoun, K. Al Agha, A. Zouaoui, and T. Pasquier, “Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems,” inSecurity Symposium (USENIX Sec’25). USENIX, 2025

  26. [26]

    ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems,

    B. Jiang, T. Bilot, N. El Madhoun, K. Al Agha, A. Zouaoui, S. Iqbal, X. Han, and T. Pasquier, “ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems,” in Security Symposium (USENIX Sec’25). USENIX, 2025

  27. [27]

    Kairos: Practical intrusion detection and investigation using whole- system provenance,

    Z. Cheng, Q. Lv, J. Liang, Y . Wang, D. Sun, T. Pasquier, and X. Han, “Kairos: Practical intrusion detection and investigation using whole- system provenance,” 2023

  28. [28]

    Transparent computing engagement 3 data re- lease accessed 29th january 2025

    DARPA I2O, “Transparent computing engagement 3 data re- lease accessed 29th january 2025.” https://github.com/darpa-i2o/ Transparent-Computing/blob/master/README-E3.md, 2018

  29. [29]

    Transparent computing engagement 5,

    DARPA 5, “Transparent computing engagement 5,” https://github. com/darpa-i2o/Transparent-Computing, 2019

  30. [30]

    Operationally transparent cyber (optc) data,

    DARPA OpTc, “Operationally transparent cyber (optc) data,” https: //github.com/FiveDirections/OpTC-data, 2019

  31. [31]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 2017

  32. [32]

    Graph attention networks,

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” 2018

  33. [33]

    Winodws event tracing,

    Don Marshall, “Winodws event tracing,” https://docs. microsoft.com/en-us/windows-hardware/drivers/devtest/ event-tracing-for-windows--etw-, 2021

  34. [34]

    Linux audit,

    Steve Grubb, “Linux audit,” https://linux.die.net/man/8/auditd, 2021

  35. [35]

    Neville-Neil, “Dtrace,” https://wiki.freebsd.org/DTrace, 2018

    George V . Neville-Neil, “Dtrace,” https://wiki.freebsd.org/DTrace, 2018

  36. [36]

    Custos: Practical tamper-evident auditing of operating systems using trusted execution,

    R. Paccagnella, P. Datta, W. U. Hassan, A. Bates, C. W. Fletcher, A. Miller, and D. Tian, “Custos: Practical tamper-evident auditing of operating systems using trusted execution,” in27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society, 2020

  37. [37]

    Trustworthy Whole- System provenance for the linux kernel,

    A. Bates, D. J. Tian, K. R. Butler, and T. Moyer, “Trustworthy Whole- System provenance for the linux kernel,” in24th USENIX Security Symposium (USENIX Security 15). Washington, D.C.: USENIX Association, Aug. 2015, pp. 319–334

  38. [38]

    Logging to the danger zone: Race condition attacks and defenses on system audit frameworks,

    R. Paccagnella, K. Liao, D. Tian, and A. Bates, “Logging to the danger zone: Race condition attacks and defenses on system audit frameworks,” inCCS ’20: 2020 ACM SIGSAC Conference on Com- puter and Communications Security, Virtual Event, USA, November 9-13, 2020, J. Ligatti, X. Ou, J. Katz, and G. Vigna, Eds. ACM, 2020, pp. 1551–1574

  39. [39]

    Practical whole-system provenance capture,

    T. F. J. Pasquier, X. Han, M. Goldstein, T. Moyer, D. M. Eyers, M. I. Seltzer, and J. Bacon, “Practical whole-system provenance capture,” inProceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, September 24-27, 2017. ACM, 2017, pp. 405–418

  40. [40]

    Dynamic malware analysis with feature engineering and feature learning,

    Z. Zhang, P. Qi, and W. Wang, “Dynamic malware analysis with feature engineering and feature learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 1210– 1217

  41. [41]

    Efficient Estimation of Word Representations in Vector Space

    T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient esti- mation of word representations in vector space,”arXiv preprint arXiv:1301.3781, 2013

  42. [42]

    Skip-gram word embeddings in hyperbolic space,

    M. Leimeister and B. J. Wilson, “Skip-gram word embeddings in hyperbolic space,” 2019

  43. [43]

    Neural message passing for quantum chemistry

    J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,”CoRR, vol. abs/1704.01212, 2017

  44. [44]

    Knn model-based approach in classification,

    G. Guo, H. Wang, D. Bell, Y . Bi, and K. Greer, “Knn model-based approach in classification,” inOn The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, R. Meersman, Z. Tari, and D. C. Schmidt, Eds., 2003

  45. [45]

    The faiss library,

    M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazar´e, M. Lomeli, L. Hosseini, and H. J ´egou, “The faiss library,” 2024

  46. [46]

    Billion-scale similarity search with GPUs,

    J. Johnson, M. Douze, and H. J ´egou, “Billion-scale similarity search with GPUs,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535– 547, 2019

  47. [47]

    Graphmae: Self-supervised masked graph autoencoders,

    Z. Hou, X. Liu, Y . Cen, Y . Dong, H. Yang, C. Wang, and J. Tang, “Graphmae: Self-supervised masked graph autoencoders,” 2022

  48. [48]

    How attentive are graph attention networks?arXiv preprint arXiv:2105.14491,

    S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?”arXiv preprint arXiv:2105.14491, 2021

  49. [49]

    Multilayer perceptron and neural networks,

    M.-C. Popescu, V . E. Balas, L. Perescu-Popescu, and N. Mastorakis, “Multilayer perceptron and neural networks,”WSEAS Trans. Cir. and Sys., vol. 8, no. 7, p. 579–588, Jul. 2009

  50. [50]

    Dos and don’ts of machine learning in computer security,

    D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, and K. Rieck, “Dos and don’ts of machine learning in computer security,” 2021

  51. [51]

    Learning from data: concepts, theory, and methods,

    L. John Wiley & Sons, “Learning from data: concepts, theory, and methods,” 2007

  52. [52]

    Fast memory- efficient anomaly detection in streaming heterogeneous graphs,

    E. A. Manzoor, S. M. Milajerdi, and L. Akoglu, “Fast memory- efficient anomaly detection in streaming heterogeneous graphs,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi, Eds. ...

  53. [53]

    ATLAS: A sequence-based learning approach for attack investigation,

    A. Alsaheel, Y . Nan, S. Ma, L. Yu, G. Walkup, Z. B. Celik, X. Zhang, and D. Xu, “ATLAS: A sequence-based learning approach for attack investigation,” in30th USENIX Security Symposium (USENIX Secu- rity 21), 2021

  54. [54]

    W ATSON: abstracting behaviors from audit logs via aggregation of contextual semantics,

    J. Zeng, Z. L. Chua, Y . Chen, K. Ji, Z. Liang, and J. Mao, “W ATSON: abstracting behaviors from audit logs via aggregation of contextual semantics,” in28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021. The Internet Society, 2021

  55. [55]

    The freebsd project,

    “The freebsd project,” https://www.freebsd.org/, 2025

  56. [56]

    The linux kernel archives,

    “The linux kernel archives,” https://www.kernel.org/, 2025

  57. [57]

    Android open source project,

    “Android open source project,” https://source.android.com/, 2025

  58. [58]

    Slot: Provenance-driven apt detection through graph reinforcement learning,

    W. Qiao, Y . Feng, T. Li, Z. Ma, Y . Shen, J. Ma, and Y . Liu, “Slot: Provenance-driven apt detection through graph reinforcement learning,” 2025

  59. [59]

    Ocr-apt: Reconstructing apt stories from audit logs using subgraph anomaly detection and llms,

    A. Aly, E. Mansour, and A. Youssef, “Ocr-apt: Reconstructing apt stories from audit logs using subgraph anomaly detection and llms,” ser. CCS ’25

  60. [60]

    Incor- porating gradients to rules: Towards lightweight, adaptive provenance- based intrusion detection,

    L. Wang, X. Shen, W. Li, Z. Li, R. Sekar, H. Liu, and Y . Chen, “Incor- porating gradients to rules: Towards lightweight, adaptive provenance- based intrusion detection,”arXiv preprint arXiv:2404.14720, 2024

  61. [61]

    Auc: a statistically consistent and more discriminating measure than accuracy,

    C. X. Ling, J. Huang, and H. Zhang, “Auc: a statistically consistent and more discriminating measure than accuracy,” inProceedings of the 18th International Joint Conference on Artificial Intelligence, ser. IJCAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, p. 519–524

  62. [62]

    The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment,

    D. Chicco, M. J. Warrens, and G. Jurman, “The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment,”IEEE Access, vol. 9, pp. 78 368–78 381, 2021

  63. [63]

    Ground truth file,

    D. T. Computing, “Ground truth file,” https://drive.google.com/file/d/ 1mrs4LWkGk-3zA7t7v8zrhm0yEDHe57QU/view, 2018

  64. [64]

    REAPr label set,

    REAPr, “REAPr label set,” https://bitbucket.org/sts-lab/ reapr-ground-truth/src/master/, 2024

  65. [65]

    J. Han, M. Kamber, and J. Pei,Data mining: Concepts and tech- niques. Elsevier, 2011

  66. [66]

    R ¨ochner, H

    P. R ¨ochner, H. O. Marques, R. J. G. B. Campello, A. Zimek, and F. Rothlauf,Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers. Springer Nature Switzerland, Oct. 2024, p. 215–222

  67. [67]

    Gaussian mixture model with local consistency,

    J. Liu, D. Cai, and X. He, “Gaussian mixture model with local consistency,” inProceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, ser. AAAI’10. AAAI Press, 2010, p. 512–517

  68. [68]

    The locally gaussian density estimator for multivariate data,

    H. Otneim and D. TjØstheim, “The locally gaussian density estimator for multivariate data,”Statistics and Computing, vol. 27, no. 6, p. 1595–1616, Nov. 2017

  69. [69]

    Support vector method for novelty detection,

    B. Sch ¨olkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, “Support vector method for novelty detection,” ser. NIPS’99. Cam- bridge, MA, USA: MIT Press, 1999, p. 582–588

  70. [70]

    Isolation forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413–422

  71. [71]

    arXiv preprint arXiv:1901.03407 , year=

    R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” 2019. [Online]. Available: https://arxiv.org/abs/1901.03407

  72. [72]

    Sometimes, you aren’t what you do: Mimicry attacks against provenance graph host intrusion detection systems,

    A. Goyal, X. Han, G. Wang, and A. Bates, “Sometimes, you aren’t what you do: Mimicry attacks against provenance graph host intrusion detection systems,” in30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, Febru- ary 27 - March 3, 2023. The Internet Society, 2023

  73. [73]

    THREATRACE: detecting and tracing host- based threats in node level through provenance graph learning,

    S. Wang, Z. Wang, T. Zhou, H. Sun, X. Yin, D. Han, H. Zhang, X. Shi, and J. Yang, “THREATRACE: detecting and tracing host- based threats in node level through provenance graph learning,”IEEE Trans. Inf. Forensics Secur., vol. 17, pp. 3972–3987, 2022

  74. [74]

    Cross- sentence n-ary relation extraction with graph lstms,

    N. Peng, H. Poon, C. Quirk, K. Toutanova, and W. tau Yih, “Cross- sentence n-ary relation extraction with graph lstms,” 2017

  75. [75]

    Temporal graph networks for deep learning on dynamic graphs,

    E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,”CoRR, vol. abs/2006.10637, 2020. A. Entity and Event Type Specification Table 8 enumerates the entity and event types used in constructing the provenance graphG= (V, E). Category Types Considered in PROVFUSION Entity...

  76. [76]

    Recvfrom … … … … Org.mozilla.fennec-firefox-dev _287344 166.199.230.185.80 _198077

  77. [77]

    Recvfrom … … … … 166.199.230.185.80 _198786 Org.mozilla.fennec-firefox-dev _279719

  78. [78]

    5 of 7 voted anomalous

    Recvfrom Figure 7: The attack subgraph generated by PROVFUSION for theCLEARSCOPE-E3dataset, highlighting the key malicious events and their causal relationships. In this graph, circles representFILE, squares representPROCESS, and diamonds representNETWORKnodes. The entity marked as red denotes true positives detected by PROVFUSION. Beyond aggregate metric...

  79. [79]

    Well-motivated problem with a compelling diagno- sis of why single-view detectors fail in complemen- tary ways

  80. [80]

    Clear empirical gains over both individual base- lines and naive ensemble combinations, validated through extensive ablations

Showing first 80 references.