Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation

Sadif Ahmed , Md Nafiu Rahman , Zahin Wahab , Gias Uddin , Rifat Shahriyar

Authors on Pith no claims yet

classification 💻 cs.SE

keywords secretsdetectiongithubapproachissueissueslearningllms

read the original abstract

In the digital era, accidental exposure of sensitive information such as API keys, tokens, and credentials is a growing security threat. While most prior work focuses on detecting secrets in source code, leakage in software issue reports remains largely unexplored. This study fills that gap through a large-scale analysis and a practical detection pipeline for exposed secrets in GitHub issues. Our pipeline combines regular expression-based extraction with large language model (LLM)-based contextual classification to detect real secrets and reduce false positives. We build a benchmark of 54,148 instances from public GitHub issues, including 5,881 manually verified true secrets. Using this dataset, we evaluate entropy-based baselines and keyword heuristics used by prior secret detection tools, classical machine learning, deep learning, and LLM-based methods. Regex and entropy based approaches achieve high recall but poor precision, while smaller models such as RoBERTa and CodeBERT greatly improve performance (F1 = 92.70%). Proprietary models like GPT-4o perform moderately in few-shot settings (F1 = 80.13%), and fine-tuned open-source larger LLMs such as Qwen and LLaMA reach up to 94.49% F1. Finally, we also validate our approach on 178 real-world GitHub repositories, achieving an F1-score of 81.6% which demonstrates our approach's strong ability to generalize to in-the-wild scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IssueGuard: Real-Time Secret Leak Prevention Tool for GitHub Issue Reports
cs.CR 2026-02 unverdicted novelty 4.0

IssueGuard delivers real-time secret detection in GitHub issues via regex and CodeBERT, reaching 92.7% F1-score and outperforming pure regex scanners.