CyberMetric: A benchmark dataset for evaluating large language models knowledge in cybersecurity

Tihanyi, N · 2023 · arXiv 2402.07688

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

cs.CR · 2026-04-21 · conditional · novelty 8.0

A new benchmark shows frontier LLMs achieve only 3.8% average recall identifying malicious events from raw logs and fail to meet 50% recall thresholds on most tactics.

LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations

cs.CR · 2026-04-07 · unverdicted · novelty 5.0

LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.

citing papers explorer

Showing 2 of 2 citing papers.

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps cs.CR · 2026-04-21 · conditional · none · ref 3
A new benchmark shows frontier LLMs achieve only 3.8% average recall identifying malicious events from raw logs and fail to meet 50% recall thresholds on most tactics.
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations cs.CR · 2026-04-07 · unverdicted · none · ref 62
LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.

CyberMetric: A benchmark dataset for evaluating large language models knowledge in cybersecurity

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer