Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities
read the original abstract
Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data security and its connection to LLMs' generalization abilities - remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state-of-the-art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts, spanning multiple domains, lengths, writing styles, and topics, paired with their encrypted versions. Using zero-shot and few-shot settings along with chain-of-thought prompting, we assess LLMs' decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs' strengths and limitations in side-channel scenarios and raise concerns about their susceptibility to under-generalization-related attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Do LLMsMakeNeural Distinguishers Wise?
LLM-based neural distinguishers on SPECK-32/64 show no improvement over ResNet but gain from XOR-inclusive prompts.
-
Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography
Fine-tuned GPT-4.1-mini reaches 0.9072 static similarity and 92.5% functional correctness on a new synthetic dataset of cryptographic code migrations, outperforming zero-shot baselines.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.