pith. sign in

arxiv: 2606.31557 · v1 · pith:IEIHHFCInew · submitted 2026-06-30 · 💻 cs.CR · cs.AI

CVE-TTP KG: Knowledge Graph Linking Software Vulnerabilities to Attack Behaviors

Pith reviewed 2026-07-01 05:24 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords knowledge graphCVEMITRE ATT&CKvulnerabilitythreat intelligencerelation extractioncybersecuritytransformer models
0
0 comments X

The pith

A knowledge graph connects CVE vulnerabilities to MITRE ATT&CK tactics and techniques through classification and relation extraction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a CVE-TTP Knowledge Graph that links entries from vulnerability databases to attacker behaviors described in the MITRE ATT&CK framework. Transformer models classify tactics and techniques from CVE text, and separate extraction models identify entities and relations to populate the graph. A sympathetic reader would care because standard vulnerability records give technical details but no direct map to how adversaries actually exploit them, which limits prioritized defense and incident response.

Core claim

We construct a CVE-TTP Knowledge Graph that links CVEs to tactics and techniques using classification and relation extraction. Transformer-based models are developed for behavior identification, with CySecBERT achieving macro F1-scores of 87.71% (techniques) and 96.16% (tactics). We created an annotated dataset with 24,820 entities and 43,608 relations for entity and relation extraction. The pipeline-based approach achieves macro F1-scores of 0.86 (entity extraction) and 0.99 (relation extraction), while a span-based joint model achieves 0.78. These outputs are integrated into a Neo4j-based Cyber Threat Knowledge Graph, enabling structured visualization of vulnerabilities.

What carries the argument

The CVE-TTP Knowledge Graph, populated by transformer classification of tactics/techniques plus pipeline and joint models for entity-relation extraction from CVE descriptions.

If this is right

  • Vulnerability data gains direct behavioral context, allowing defenders to interpret threats beyond technical severity scores.
  • The Neo4j graph supports structured queries and visualization that connect specific CVEs to sequences of attacker actions.
  • Automated extraction pipelines can scale the mapping process beyond manual ATT&CK curation.
  • Joint entity-relation models reduce the need for separate pipeline stages when processing new vulnerability reports.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security teams could query the graph to surface all CVEs that enable a chosen technique, speeding patch prioritization during campaigns.
  • The same extraction approach might extend to other unstructured sources such as threat reports or exploit code to grow the graph automatically.
  • If the links prove reliable, incident responders could use the graph to hypothesize attacker next steps from an observed CVE.

Load-bearing premise

The manually created annotations of 24,820 entities and 43,608 relations correctly capture how real CVE descriptions map to ATT&CK behaviors without major labeling errors or selection bias.

What would settle it

Run the trained models on a fresh set of CVEs whose real-world exploitation incidents are already documented with observed ATT&CK techniques, then check whether the graph links match the documented behaviors at rates above random.

Figures

Figures reproduced from arXiv: 2606.31557 by Antonino Nocera, Basant Agarwal, Dincy R. Arikkat, Serena Nicolazzo, Swati yadav, Vinod P.

Figure 1
Figure 1. Figure 1: Architecture of the CVE-TTP Knowledge Graph. The construction process consists of four stages: [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Year-wise distribution of CVEs. The plot depicts the annual distribution of collected vulnerabilities, showing [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Workflow of pipeline-based entity and relation extraction [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Workflow for joint entity and relation extraction [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: CVE-TTP KG generated by the pipeline model, illustrating entity associations alongside semantic inconsis [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: CVE-TTP KG generated by the joint model includes CVE, CWE, product, tactic, and technique entities with [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Confusion matrix for the Entity Recognition stage of the pipeline approach. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Confusion matrix for the Relation Extraction stage of the pipeline approach. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Confusion matrix for entities in the joint model. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Confusion matrix for relations in the joint model. [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
read the original abstract

In the evolving threat landscape, adversaries exploit software vulnerabilities to launch sophisticated attacks, challenging traditional defenses. Although databases like CVE and NVD provide detailed technical information, they often lack links to attacker behaviors such as tactics and techniques, limiting effective threat interpretation and response. This work bridges this gap by connecting vulnerabilities with behavioral patterns from the MITRE ATT&CK framework. We construct a CVE-TTP Knowledge Graph that links CVEs to tactics and techniques using classification and relation extraction. Transformer-based models are developed for behavior identification, with CySecBERT achieving macro F1-scores of 87.71% (techniques) and 96.16% (tactics). Also, we created an annotated dataset with 24,820 entities and 43,608 relations for entity and relation extraction. The pipeline-based approach achieves macro F1-scores of 0.86 (entity extraction) and 0.99 (relation extraction), while a span-based joint model achieves 0.78. These outputs are integrated into a Neo4j-based Cyber Threat Knowledge Graph, enabling structured visualization of vulnerabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper constructs a CVE-TTP Knowledge Graph linking CVEs from NVD to MITRE ATT&CK tactics and techniques via transformer-based classification and relation extraction. It reports creating a manually annotated dataset containing 24,820 entities and 43,608 relations; a pipeline approach yields macro F1 of 0.86 for entity extraction and 0.99 for relation extraction, while CySecBERT achieves 87.71% (techniques) and 96.16% (tactics) macro F1. The resulting graph is stored in Neo4j for visualization and threat analysis.

Significance. If the annotations prove reliable, the work would supply a concrete resource for mapping vulnerabilities to attacker behaviors, supporting automated threat intelligence pipelines that combine CVE technical details with ATT&CK behavioral context.

major comments (1)
  1. [Abstract / Dataset section] Abstract and dataset description: the central performance claims (F1 0.86/0.99 for extraction, 87.71%/96.16% for classification) and the resulting CVE-TTP KG all depend on the quality of the 24,820-entity / 43,608-relation manually annotated dataset, yet no information is supplied on annotation guidelines, number of annotators, inter-annotator agreement, adjudication procedure, or external validation against existing ATT&CK mappings. Without these details the reported metrics cannot be interpreted as evidence of model capability rather than annotation artifacts.
minor comments (1)
  1. [Abstract] The abstract states specific F1 scores but omits any reference to train/test splits, baseline models, or statistical tests; these should be added to allow evaluation of the reported numbers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed feedback on the dataset description. We agree that transparency regarding the annotation process is necessary to properly interpret the reported metrics and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / Dataset section] Abstract and dataset description: the central performance claims (F1 0.86/0.99 for extraction, 87.71%/96.16% for classification) and the resulting CVE-TTP KG all depend on the quality of the 24,820-entity / 43,608-relation manually annotated dataset, yet no information is supplied on annotation guidelines, number of annotators, inter-annotator agreement, adjudication procedure, or external validation against existing ATT&CK mappings. Without these details the reported metrics cannot be interpreted as evidence of model capability rather than annotation artifacts.

    Authors: We agree that the current manuscript does not supply the requested details on the annotation process. In the revised version we will add a dedicated subsection in the Dataset section that describes the annotation guidelines, the number of annotators, inter-annotator agreement statistics, the adjudication procedure, and any external validation steps performed against existing ATT&CK mappings. This addition will enable readers to evaluate the dataset quality independently of the model results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper constructs a manually annotated CVE-TTP dataset (24,820 entities, 43,608 relations) and reports standard supervised evaluation metrics (F1 scores) for entity/relation extraction and classification models trained on it. No equations, derivations, or claims reduce by construction to their own inputs; no self-citation chains justify uniqueness theorems or ansatzes; no fitted parameters are relabeled as independent predictions. The pipeline is self-contained against external benchmarks via held-out test performance, making this a normal non-circular ML dataset+model paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumptions that the ATT&CK framework mappings are meaningful for CVE text and that the annotated dataset is reliable for training generalizable models. Since only the abstract is available, these are inferred from the described pipeline and results.

axioms (2)
  • domain assumption The MITRE ATT&CK framework provides a comprehensive and accurate representation of attacker behaviors that can be reliably mapped to CVE descriptions.
    The linking task assumes ATT&CK is the appropriate behavioral ontology.
  • domain assumption Manually annotated data for entity and relation extraction in cybersecurity text can be created at scale without prohibitive errors.
    The reported F1 scores depend on the quality of the 24,820-entity dataset.
invented entities (1)
  • CVE-TTP Knowledge Graph no independent evidence
    purpose: Structured representation linking vulnerabilities to attack behaviors
    The graph is the constructed output of the pipeline; no external validation or independent evidence is mentioned.

pith-pipeline@v0.9.1-grok · 5741 in / 1567 out tokens · 40520 ms · 2026-07-01T05:24:36.242392+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages

  1. [1]

    Computers & Security 145, 103990 (2024)

    Arikkat, D.R., Vinod, P ., KA, R.R., Nicolazzo, S., Nocera, A., Timpau, G., Conti, M.: Ostis: A novel organization-specific threat intelligence system. Computers & Security 145, 103990 (2024)

  2. [2]

    ACM Transactions on Privacy and Security 27(2), 1–20 (2024)

    Bayer, M., Kuehn, P ., Shanehsaz, R., Reuter, C.: Cysecbert: A domain-adapted language model for the cyberse- curity domain. ACM Transactions on Privacy and Security 27(2), 1–20 (2024)

  3. [3]

    In: 2020 IEEE International Conference on Big Data (Big Data)

    Dasgupta, S., Piplai, A., Kotal, A., Joshi, A.: A comparative study of deep learning based named entity recog- nition algorithms for cybersecurity. In: 2020 IEEE International Conference on Big Data (Big Data). pp. 2596–

  4. [4]

    IEEE Transactions on Software Engineering 49(3), 1359–1373 (2022)

    Di Tizio, G., Armellini, M., Massacci, F.: Software updates strategies: A quantitative evaluation against advanced persistent threats. IEEE Transactions on Software Engineering 49(3), 1359–1373 (2022)

  5. [5]

    Artificial Intelligence Review 58(9), 287 (2025)

    Diaz-Garcia, J.A., Lopez, J.A.D.: A survey on cutting-edge relation extraction techniques based on language models. Artificial Intelligence Review 58(9), 287 (2025)

  6. [6]

    Information and Software Technology 144, 106771 (2022)

    Dissanayake, N., Jayatilaka, A., Zahedi, M., Babar, M.A.: Software security patch management-a systematic lit- erature review of challenges, approaches, tools and practices. Information and Software Technology 144, 106771 (2022)

  7. [7]

    [Dong et al

    Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint arXiv:1909.07755 (2019)

  8. [8]

    Falcarin, P ., Dainese, F.: Building a cybersecurity knowledge graph with cybergraph. In: Proceedings of the 2024 ACM/IEEE 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) and 2024 IEEE/ACM Second International Workshop on Software Vulnerability. pp. 29–36 (2024)

  9. [9]

    Gao, P ., Liu, X., Choi, E., Ma, S., Y ang, X., Song, D.: Threatkg: An ai-powered system for automated open- source cyber threat intelligence gathering and management (2024), https://arxiv.org/abs/2212.10388

  10. [10]

    In: Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, November 19-21, 2021, Proceedings, Part I 23

    Guo, Y ., Liu, Z., Huang, C., Liu, J., Jing, W., Wang, Z., Wang, Y .: Cyberrel: Joint entity and relation extraction for cybersecurity concepts. In: Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, November 19-21, 2021, Proceedings, Part I 23. pp. 447–463. Springer (2021)

  11. [11]

    Computers & Security 145, 103999 (2024)

    Hu, Y ., Zou, F., Han, J., Sun, X., Wang, Y .: Llm-tikg: Threat intelligence knowledge graph construction utilizing large language model. Computers & Security 145, 103999 (2024)

  12. [12]

    In: Proceedings of the 16th International Conference on Availability, Reliability and Security

    Kuppa, A., Aouad, L., Le-Khac, N.A.: Linking cves to mitre att&ck techniques. In: Proceedings of the 16th International Conference on Availability, Reliability and Security. pp. 1–12 (2021)

  13. [13]

    Vulnerabilities and attacks against industrial control systems and critical infrastructures,

    Makrakis, G.M., Kolias, C., Kambourakis, G., Rieger, C., Benjamin, J.: Vulnerabilities and attacks against industrial control systems and critical infrastructures. arXiv preprint arXiv:2109.03945 (2021)

  14. [14]

    IEEE Transactions on Knowledge and Data Engineering 35(6), 5695– 5709 (2022)

    Ren, Y ., Xiao, Y ., Zhou, Y ., Zhang, Z., Tian, Z.: Cskg4apt: A cybersecurity knowledge graph for advanced persistent threat organization attribution. IEEE Transactions on Knowledge and Data Engineering 35(6), 5695– 5709 (2022)

  15. [15]

    Computers & Security 120, 102788 (2022) 12 CVE-TTP KG

    Riera, T.S., Higuera, J.R.B., Higuera, J.B., Herraiz, J.J.M., Montalvo, J.A.S.: A new multi-label dataset for web attacks capec classification using machine learning techniques. Computers & Security 120, 102788 (2022) 12 CVE-TTP KG

  16. [16]

    ACM Transactions on Privacy and Security 27(1), 1–26 (2024)

    Shi, Z., Matyunin, N., Graffi, K., Starobinski, D.: Uncovering cwe-cve-cpe relations with threat knowledge graphs. ACM Transactions on Privacy and Security 27(1), 1–26 (2024)

  17. [17]

    In: 1st International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, NLPAICS 2024 (2024)

    Simonetto, S., Bosch, P .: Comprehensive threat analysis and systematic mapping of cves to mitre framework. In: 1st International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, NLPAICS 2024 (2024)

  18. [18]

    In: 2023 IEEE/ACM 45th International Conference on Software Engineer- ing: Companion Proceedings (ICSE-Companion)

    Sun, J., Xing, Z., Lu, Q., Xu, X., Zhu, L.: A multi-faceted vulnerability searching website powered by aspect- level vulnerability knowledge graph. In: 2023 IEEE/ACM 45th International Conference on Software Engineer- ing: Companion Proceedings (ICSE-Companion). pp. 60–63. IEEE (2023)

  19. [19]

    In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

    Wang, X., He, S., Xiong, Z., Wei, X., Jiang, Z., Chen, S., Jiang, J.: Aptner: A specific dataset for ner missions in cyber threat intelligence field. In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). pp. 1233–1238. IEEE (2022)

  20. [20]

    In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

    Wang, X., Liu, R., Y ang, J., Chen, R., Ling, Z., Y ang, P ., Zhang, K.: Cyber threat intelligence entity extrac- tion based on deep learning and field knowledge engineering. In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). pp. 406–413. IEEE (2022)

  21. [21]

    In: Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part III 26

    Xiao, H., Xing, Z., Li, X., Guo, H.: Embedding and predicting software security entity relationships: A knowl- edge graph based approach. In: Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part III 26. pp. 50–63. Springer (2019)

  22. [22]

    In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

    Zhang, J., Wen, H., Li, L., Zhu, H.: Unittp: A unified framework for tactics, techniques, and procedures mapping in cyber threats. In: 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). pp. 1580–1588. IEEE (2024)

  23. [23]

    The confusion matrix for the CER phase is presented in Figure 7

    Zhang, Y ., Du, T., Ma, Y ., Wang, X., Xie, Y ., Y ang, G., Lu, Y ., Chang, E.C.: Attackg+:boosting attack knowledge graph construction with large language models (2024), https://arxiv.org/abs/2405.04753 A Class-wise Analysis of Entity Relation Extraction Approaches A.1 Pipeline Approach CVE_ID CWE_ID Impact O Product_Name Product_V ersion T actic T echni...