SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

Adetayo Adebimpe; Helmut Neukirchen; Thomas Welsh

arxiv: 2510.21459 · v1 · pith:EZTY2RXKnew · submitted 2025-10-24 · 💻 cs.CR · cs.CL· cs.LG

SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

Adetayo Adebimpe , Helmut Neukirchen , Thomas Welsh This is my paper

classification 💻 cs.CR cs.CLcs.LG

keywords llmsaccuracymodelssystemsystemsattackerbeendata-protection

0 comments

read the original abstract

Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Prompt Governance? On Governing Technologies Governed by Natural Language
cs.CY 2026-04 unverdicted novelty 4.0

Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.