AuditBench is a new benchmark of audit logs from 50+ malicious and benign scenarios that evaluates five LLMs on four security investigation tasks and analyzes their performance and error profiles.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Sieve uses an LLM to generate executable queries from natural language security questions grounded by auto-extracted log-format context, cutting error rates over 3x on complex temporal and cross-event tasks versus manual scripting across 133 queries and 5 log types.
citing papers explorer
-
Benchmarking and Exploring the Capabilities of LLMs for Attack Investigations
AuditBench is a new benchmark of audit logs from 50+ malicious and benign scenarios that evaluates five LLMs on four security investigation tasks and analyzes their performance and error profiles.
-
Parser-Free Querying of Security Logs
Sieve uses an LLM to generate executable queries from natural language security questions grounded by auto-extracted log-format context, cutting error rates over 3x on complex temporal and cross-event tasks versus manual scripting across 133 queries and 5 log types.