arxiv: 2605.11504 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.CR

Recognition: 1 theorem link

· Lean Theorem

CTFusion: A CTF-based Benchmark for LLM Agent Evaluation

Dongjun Lee, Ga-eun Bae, Insu Yun

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:33 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords CTFLLM agentsbenchmarkdata contaminationcybersecurityevaluation frameworklive CTF

0 comments

The pith

Reused CTF challenges allow data contamination that inflates LLM agent scores, which CTFusion fixes by streaming evaluations from live events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing CTF benchmarks reuse old challenges, enabling agents to cheat via web search or memorized solutions and producing unreliable results for cybersecurity tasks. Experiments with an agent equipped with search tools confirm that contamination occurs in practice on static benchmarks. CTFusion counters this by running on live CTF events, keeping each agent's session independent even under one team account, and forwarding only the first correct flag per challenge to limit disruption to the competition. The system is built as an MCP server for the common CTFd platform so it works with many events and agent designs. Tests across three LLMs, two agents, and five live CTFs indicate that this live approach yields more trustworthy assessments than reused datasets.

Core claim

CTFusion is a streaming evaluation framework built on live CTFs that preserves per-agent independence under a single team account and reduces competition impact by forwarding only the first correct flag per challenge, implemented as an MCP server on CTFd to support diverse events and agents.

What carries the argument

CTFusion streaming framework on CTFd, which enforces per-agent independence and first-flag-only forwarding to prevent contamination and competition effects during live evaluations.

Load-bearing premise

Live CTF events stay uncontaminated and the independence rule plus first-flag forwarding fully blocks data leakage and competition distortion.

What would settle it

An agent using web search succeeding on a CTFusion challenge whose live event had no prior public solutions or leaks.

Figures

Figures reproduced from arXiv: 2605.11504 by Dongjun Lee, Ga-eun Bae, Insu Yun.

**Figure 1.** Figure 1: Monthly distribution of CTF competitions (2025). 2.2. Related Work LLM Agents for Vulnerability Discovery. A variety of LLM agents have been developed to automate vulnerability discovery in CTFs and real-world systems. ENIGMA (Abramovich et al., 2025) first demonstrated this capability, and D-CIPHER (Udeshi et al., 2025) further improved upon it by incorporating an auto-prompter agent to guide exploitati… view at source ↗

**Figure 2.** Figure 2: Success rates of D-CIPHER-WEB and D-CIPHER on NYU CTF BENCH [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Evidence of direct flag retrieval by D-CIPHER-WEB for the “1nsayne” challenge. 3.2. Cheating Evidence in Evaluations We evaluated D-CIPHER-WEB on NYU CTF BENCH and found that it achieved a substantially higher success rate than the original D-CIPHER. We conducted these evaluations using the same conditions and metrics as in §5.1. Specifically, D-CIPHER-WEB solved 24.07% of problems compared to D-CIPHER’s … view at source ↗

**Figure 4.** Figure 4: Overview of the CTFUSION framework architecture. 4. CTFUSION 4.1. Overview CTFUSION enables fair evaluation of multiple agents on LIVE CTFS while keeping the live competition intact. In a live CTF, each account corresponds to a team on the public scoreboard. Creating one account per agent would inflate team counts and distort rankings. Sharing a single account avoids this, but it breaks independence: once … view at source ↗

**Figure 5.** Figure 5: Performance Comparison: Live CTFs vs NYU CTF Bench 15.0% (×2.4), and CLAUDE 3.5-SONNET from 5.1% to 11.4% (×2.2). At the agent level, ENIGMA increased from 7.2% to 16.8% (×2.3), and D-CIPHER from 5.3% to 12.6% (×2.4). Two primary factors explain the observed performance gap: • Task difficulty: Although both LIVE CTFS and NYU CTF BENCH evaluations use identical environments and interaction, problem difficu… view at source ↗

**Figure 6.** Figure 6: Default prompt for D-CIPHER-WEB. Default prompt [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Specialized prompt for pwn challenges in D-CIPHER-WEB. prompt defines a two-role workflow—Planner and Executor. The Planner produces an iterative, step-by-step plan for solving the designated CTF challenge, and the Executor performs the delegated tasks within the containerized evaluation. The agent pair is provided with a Linux environment, access to network interfaces, and web search access. The prompt re… view at source ↗

**Figure 8.** Figure 8: Success rates across five LIVE CTFS and NYU CTF BENCH. CubeCTF. We participated in CUBECTF, which ran from July 4, 2025, 22:16 UTC to July 7, 2025, 00:25 UTC. The competition hosted 1,059 teams and included 16 problems, of which 375 teams solved one or more. GPT-4.1 with ENIGMA ranked 163rd, while GPT-4.1 with D-CIPHER ranked 180th. CLAUDE 3.5-SONNET with ENIGMA did not achieve a rank, but CLAUDE 3.5-SONNE… view at source ↗

**Figure 10.** Figure 10: Problem-solving rates of all model-agent combinations on UIUCTF. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 1.82% 0.00% 0.00% 0.00% 1.82% 1.82% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Problem-solving rates of all model-agent combinations on WWCTF. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 2.70% 0.00% 0.00% 0.00% 0.00% 0.00% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 13.** Figure 13: Problem-solving rates of all model-agent combinations on SCRIPTCTF. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Problem-solving rates for all model-agent pairs on 2023-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 0.00% 18.75% 0.00% 18.75% 6.25% 12.50% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 16.** Figure 16: Problem-solving rates for all model-agent pairs on 2022-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 9.09% 9.09% 0.00% 0.00% 9.09% 0.00% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 18.** Figure 18: Problem-solving rates for all model-agent pairs on 2021-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 11.11% 11.11% 11.11% 0.00% 0.00% 0.00% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗

**Figure 20.** Figure 20: Problem-solving rates for all model-agent pairs on 2020-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 40.00% 10.00% 10.00% 20.00% 30.00% 20.00% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗

**Figure 22.** Figure 22: Problem-solving rates for all model-agent pairs on 2019-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 14.29% 14.29% 14.29% 14.29% 28.57% 14.29% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p020_22.png] view at source ↗

**Figure 24.** Figure 24: Problem-solving rates for all model-agent pairs on 2018-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 11.11% 11.11% 11.11% 11.11% 11.11% 11.11% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p021_24.png] view at source ↗

**Figure 26.** Figure 26: Problem-solving rates for all model-agent pairs on 2017-Quals. GPT-4.1 Claude 3.5-Sonnet Gemini 2.5-Flash 0.00% 0.00% 0.00% 14.29% 0.00% 0.00% EnIGMA D-cipher [PITH_FULL_IMAGE:figures/full_fig_p021_26.png] view at source ↗

read the original abstract

Recent advances in Large Language Models (LLMs) have enabled agentic systems for complex, multi-step tasks; cybersecurity is emerging as a prominent application. To evaluate such agents, researchers widely adopt Capture The Flag (CTF) benchmarks. However, current CTF benchmarks reuse existing challenges, which exposes them to data contamination and potential cheating. Notably, we confirmed these issues in practice by integrating web search tools into an existing agent. To address these limitations, we present CTFusion, a streaming evaluation framework built on Live CTFs. To achieve this, CTFusion preserves per-agent independence under a single team account and reduces competition impact by forwarding only the first correct flag per challenge. Moreover, we implement CTFusion as a Model Context Protocol (MCP) server on the widely used CTFd platform, which offers broad applicability to diverse CTF events and agent types. Through experiments with three LLMs, two agents, and five Live CTFs, we demonstrate that existing CTF benchmarks can be unreliable in assessing LLM-based agents, while CTFusion can serve as a robust solution for evaluating cybersecurity agents. We release CTFusion as open source to foster future research in this area.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CTFusion gives a workable streaming setup for live CTF agent tests to cut contamination, but the shared-account isolation still needs tighter proof that no side effects remain.

read the letter

CTFusion is a practical engineering response to contamination in CTF benchmarks for LLM agents, using live events with specific isolation tricks, but the paper leaves some questions about whether those tricks fully work in shared-account setups. The new part is the CTFusion streaming framework that keeps agents independent under one team account and forwards only the first correct flag per challenge. They built this as an MCP server on CTFd, which makes it usable across many events. This extends prior CTF work by focusing on live streaming to avoid reuse issues. The experiments with three LLMs, two agents, and five live CTFs demonstrate unreliability in standard benchmarks when agents have web access, which matches what the authors confirmed in practice. It does a good job making the framework accessible and open source. The idea of reducing competition impact through first-flag forwarding is straightforward and sensible for live settings. On the soft spots, the central robustness claim depends on the mitigations eliminating contamination and competition effects. The description doesn't explain how they handle potential issues like rate limits on the shared account or sequential submissions affecting agents differently. If those create any observable differences, the isolation isn't complete. The abstract mentions the experiments support the claims, but without full methods or error analysis, it's tough to assess how solid the evidence is. The stress-test concern about residual effects seems worth checking in the full text. This paper suits people building or evaluating cybersecurity agents with LLMs. It offers a usable tool more than a deep theoretical insight. It deserves a serious referee because the problem is timely and the solution is concrete, though it would benefit from tighter validation of the assumptions. I'd recommend sending it to peer review with a note to expand on the implementation details and any tests for side effects in the live platform.

Referee Report

3 major / 2 minor

Summary. The paper claims that existing CTF benchmarks for LLM agents are unreliable due to data contamination and cheating (demonstrated via web-search tool integration experiments), and introduces CTFusion as a streaming framework for live CTFs. CTFusion achieves per-agent independence under a single team account and reduces competition impact by forwarding only the first correct flag per challenge; it is implemented as an MCP server on the CTFd platform. Experiments with three LLMs, two agents, and five live CTFs are used to support that CTFusion provides a robust alternative, with open-source release.

Significance. If the isolation and forwarding mitigations hold, this work addresses a critical and timely limitation in agent evaluation for cybersecurity, where reliable benchmarks are scarce. The open-source implementation on a widely used platform (CTFd) and the multi-LLM/multi-agent experimental setup are concrete strengths that could enable reproducible follow-up work and broader adoption in LLM agent research.

major comments (3)

[CTFusion framework description (and abstract)] The central claim that CTFusion's two mitigations (per-agent independence under a shared team account + first-flag forwarding) fully neutralize both contamination and competition-impact problems requires stronger justification. The framework description does not specify how shared-account rate limits, platform logging, or sequential flag-submission order are prevented from creating observable differences between agents; without this, the 'robust solution' claim for live events rests on an untested isolation assumption.
[Experiments (and abstract)] The experimental support for the unreliability of existing CTF benchmarks (via web-search integration) is only partially verifiable. The abstract and setup report results with three LLMs and two agents but omit full methods details, specific metrics, quantitative outcomes, or error analysis, weakening the load-bearing claim that current benchmarks are unreliable.
[Live CTF setup and experiments] The assumption that live CTF events remain uncontaminated (and that the five selected events are representative) is not tested or discussed. Potential selection effects, prior exposure, or organizer-side leakage could still affect results, and no evidence is provided that the chosen live events avoid the contamination issues shown for static benchmarks.

minor comments (2)

[Experiments] The abstract states experiments used 'five Live CTFs' but provides no list, table, or description of the specific events or challenges; adding this in the experimental section would improve reproducibility.
[CTFusion implementation] Notation for agent independence and flag-forwarding logic could be clarified with a small diagram or pseudocode, as the current prose description leaves some implementation details ambiguous.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, indicating where we agree revisions are needed to strengthen the presentation and justification.

read point-by-point responses

Referee: [CTFusion framework description (and abstract)] The central claim that CTFusion's two mitigations (per-agent independence under a shared team account + first-flag forwarding) fully neutralize both contamination and competition-impact problems requires stronger justification. The framework description does not specify how shared-account rate limits, platform logging, or sequential flag-submission order are prevented from creating observable differences between agents; without this, the 'robust solution' claim for live events rests on an untested isolation assumption.

Authors: We agree that the framework section would benefit from expanded technical details on the isolation mechanisms. In the revised manuscript we will elaborate on the MCP server implementation, specifying that submissions are queued server-side in a first-come-first-served manner without exposing order or timing to agents, that rate-limit handling occurs at the platform level to equalize impact across agents, and that logging strips any agent-identifying metadata. These design choices are already present in the released code; we will add explicit description and a short justification of why they prevent observable differences, thereby addressing the isolation assumption more directly. revision: yes
Referee: [Experiments (and abstract)] The experimental support for the unreliability of existing CTF benchmarks (via web-search integration) is only partially verifiable. The abstract and setup report results with three LLMs and two agents but omit full methods details, specific metrics, quantitative outcomes, or error analysis, weakening the load-bearing claim that current benchmarks are unreliable.

Authors: The full manuscript (Sections 3 and 4) already specifies the three LLMs, two agents, and the web-search integration experiment that demonstrates elevated success rates when external search is enabled. To improve verifiability we will expand the experiments section and add an appendix containing the complete method details (prompt templates, tool configurations), all quantitative success rates with and without search, and basic error analysis. The abstract will remain a high-level summary consistent with journal conventions. revision: partial
Referee: [Live CTF setup and experiments] The assumption that live CTF events remain uncontaminated (and that the five selected events are representative) is not tested or discussed. Potential selection effects, prior exposure, or organizer-side leakage could still affect results, and no evidence is provided that the chosen live events avoid the contamination issues shown for static benchmarks.

Authors: We will add a dedicated discussion subsection on the live-CTF experimental setup. It will describe the selection criteria for the five events (recency, diversity of challenge types, and public availability), explain why the live format inherently lowers the risk of pre-existing data contamination relative to static benchmarks, and acknowledge residual risks such as organizer-side leakage or selection effects. While exhaustive empirical verification of zero contamination is not feasible, the added discussion will make the assumptions and their limitations explicit. revision: yes

Circularity Check

0 steps flagged

No circularity in the engineering framework proposal

full rationale

The paper introduces CTFusion as an applied streaming evaluation framework on live CTFs, with per-agent independence and first-flag forwarding as design mitigations. No mathematical derivations, equations, fitted parameters, or self-citations appear in the provided text that reduce any central claim to its own inputs by construction. The confirmation of contamination issues is described as an empirical experiment with web-search tools, and the robustness claim rests on the stated engineering choices rather than any self-referential loop or renamed prior result. This is a self-contained applied contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that reused CTF challenges cause measurable contamination when agents have web access, and that live events plus the forwarding rule remove this without new biases.

axioms (2)

domain assumption LLM agents equipped with web search can solve or cheat on reused CTF challenges
Used to demonstrate unreliability of existing benchmarks
domain assumption Live CTF events supply challenges that have not been seen by the evaluated models
Core premise for contamination resistance

invented entities (1)

CTFusion streaming framework no independent evidence
purpose: Enables independent per-agent evaluation on shared live CTF accounts
New system introduced to solve the identified benchmark problems

pith-pipeline@v0.9.0 · 5505 in / 1260 out tokens · 55761 ms · 2026-05-13T01:33:21.599669+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
CTFUSION preserves per-agent independence under a single team account and reduces competition impact by forwarding only the first correct flag per challenge.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

2024 , url=

Minghao Shao and Sofija Jancheska and Meet Udeshi and Brendan Dolan-Gavitt and Haoran Xi and Kimberly Milner and Boyuan Chen and Max Yin and Siddharth Garg and Prashanth Krishnamurthy and Farshad Khorrami and Ramesh Karri and Muhammad Shafique , booktitle=. 2024 , url=

work page 2024
[2]

2024 , eprint=

Benchmark Data Contamination of Large Language Models: A Survey , author=. 2024 , eprint=

work page 2024
[3]

2025 , eprint=

Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges , author=. 2025 , eprint=

work page 2025
[4]

2023 , eprint=

Don't Make Your LLM an Evaluation Benchmark Cheater , author=. 2023 , eprint=

work page 2023
[5]

Forty-second International Conference on Machine Learning , year=

DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination , author=. Forty-second International Conference on Machine Learning , year=

work page
[6]

2025 , eprint=

LastingBench: Defend Benchmarks Against Knowledge Leakage , author=. 2025 , eprint=

work page 2025
[7]

LiveBench: A Challenging, Contamination-Free

Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz-Ziv and Neel Jain and Khalid Saifullah and Sreemanti Dey and Shubh-Agrawal and Sandeep Singh Sandha and Siddartha Venkat Naidu and Chinmay Hegde and Yann LeCun and Tom Goldstein and Willie Neiswanger and Micah Goldblum , booktitle=. LiveB...

work page
[8]

2025 , eprint=

Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark , author=. 2025 , eprint=

work page 2025
[9]

The Thirteenth International Conference on Learning Representations , year=

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[10]

2025 , eprint=

Large Language Models for Cyber Security: A Systematic Literature Review , author=. 2025 , eprint=

work page 2025
[11]

Talor Abramovich and Meet Udeshi and Minghao Shao and Kilian Lieret and Haoran Xi and Kimberly Milner and Sofija Jancheska and John Yang and Carlos E Jimenez and Farshad Khorrami and Prashanth Krishnamurthy and Brendan Dolan-Gavitt and Muhammad Shafique and Karthik R Narasimhan and Ramesh Karri and Ofir Press , booktitle=. En. 2025 , url=

work page 2025
[12]

Network and Distributed System Security (NDSS) Symposium , year=

YURASCANNER: Leveraging LLMs for Task-driven Web App Scanning , author=. Network and Distributed System Security (NDSS) Symposium , year=

work page
[13]

2025 , eprint=

D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security , author=. 2025 , eprint=

work page 2025
[14]

2024 , note =

Waisman, Nico , title =. 2024 , note =

work page 2024
[15]

2025 , note =

OpenAI , title =. 2025 , note =

work page 2025
[16]

2024 , note =

OpenAI , title =. 2024 , note =

work page 2024
[17]

2025 , note =

Anthropic , title =. 2025 , note =

work page 2025
[18]

2025 , note =

DeepMind , title =. 2025 , note =

work page 2025
[19]

2017 USENIX Workshop on Advances in Security Education (ASE 17) , year =

Kevin Chung , title =. 2017 USENIX Workshop on Advances in Security Education (ASE 17) , year =

work page 2017
[20]

2024 , month =

Google Project Zero , title =. 2024 , month =

work page 2024
[21]

2024 , eprint=

The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance? , author=. 2024 , eprint=

work page 2024
[22]

arXiv preprint arXiv:2505.17107 , url=

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution , author=. arXiv preprint arXiv:2505.17107 , url=

work page arXiv
[23]

Waisman, Nico , title =

work page
[24]

2025 , month =

OpenAI , title =. 2025 , month =

work page 2025
[25]

2024 , note =

CTFtime — Event List for 2024 , author =. 2024 , note =

work page 2024
[26]

2025 , note =

CTFtime — Event List for 2025 , author =. 2025 , note =

work page 2025
[27]

2018 , publisher =

Sebastián Ramírez , title =. 2018 , publisher =

work page 2018
[28]

2010 , url =

Flask: A Python Microframework , author =. 2010 , url =

work page 2010
[29]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

work page 2021
[30]

2024 , url =

Claude 3 Model Card , institution =. 2024 , url =

work page 2024
[31]

Introducing GPT-4.1 in the API , year =

work page
[32]

Gemini 2.5 Flash , year =

work page
[33]

2013 , version =

Docker , author =. 2013 , version =

work page 2013
[34]

DuckDuckGo---Protection

DuckDuckGo, Inc. DuckDuckGo---Protection. Privacy. Peace of Mind. 2025

work page 2025
[35]

CTFd API , author =. 2025

work page 2025
[36]

2024 , eprint=

An Empirical Evaluation of LLMs for Solving Offensive Security Challenges , author=. 2024 , eprint=

work page 2024
[37]

2025 , note =

CTFtime , title =. 2025 , note =

work page 2025
[38]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[39]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[40]

M. J. Kearns , title =

work page
[41]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[42]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[43]

Suppressed for Anonymity , author=

work page
[44]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[45]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[46]

12th # EURODW, booktitle =

work page
[47]

15th # HOTOS #

15th # HOTOS #. 15th # HOTOS #

work page
[48]

13th # FAST #

13th # FAST #. 13th # FAST #

work page
[49]

SIGOPS Oper. Syst. Rev. , year = 2016, month = mar, volume =

work page 2016
[50]

ACM Transactions on Information and System Security (TISSEC) , year = 2012, month = mar, volume =

work page 2012