Large language models can be used to effectively scale spear phishing campaigns

Julian Hazell · 2023 · arXiv 2305.06972

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

cs.CR · 2026-05-09 · unverdicted · novelty 6.0

PASA is a semantic-level watermarking method for LLM text that uses embedding-space clusters and synchronized randomness to remain detectable after paraphrasing while preserving text quality.

Process Matters more than Output for Distinguishing Humans from Machines

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning improving mimicry but limited cross-task transfer.

Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

cs.CR · 2026-04-13 · unverdicted · novelty 6.0

Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

cs.CL · 2026-04-10 · unverdicted · novelty 6.0

Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.

An Independent Safety Evaluation of Kimi K2.5

cs.CR · 2026-04-03 · conditional · novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

cs.CL · 2023-10-03 · conditional · novelty 6.0

AutoDAN automatically generates semantically meaningful jailbreak prompts for aligned LLMs via a hierarchical genetic algorithm, achieving higher attack success, cross-model transferability, and universality than baselines while bypassing perplexity defenses.

Jailbroken: How Does LLM Safety Training Fail?

cs.LG · 2023-07-05 · unverdicted · novelty 6.0

LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.

citing papers explorer

Showing 7 of 7 citing papers.

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks cs.CR · 2026-05-09 · unverdicted · none · ref 14
PASA is a semantic-level watermarking method for LLM text that uses embedding-space clusters and synchronized randomness to remain detectable after paraphrasing while preserving text quality.
Process Matters more than Output for Distinguishing Humans from Machines cs.AI · 2026-05-07 · unverdicted · none · ref 63 · 2 links
A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning improving mimicry but limited cross-task transfer.
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models cs.CR · 2026-04-13 · unverdicted · none · ref 2
Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism cs.CL · 2026-04-10 · unverdicted · none · ref 12
Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.
An Independent Safety Evaluation of Kimi K2.5 cs.CR · 2026-04-03 · conditional · none · ref 79
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models cs.CL · 2023-10-03 · conditional · none · ref 8
AutoDAN automatically generates semantically meaningful jailbreak prompts for aligned LLMs via a hierarchical genetic algorithm, achieving higher attack success, cross-model transferability, and universality than baselines while bypassing perplexity defenses.
Jailbroken: How Does LLM Safety Training Fail? cs.LG · 2023-07-05 · unverdicted · none · ref 28
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.

Large language models can be used to effectively scale spear phishing campaigns

fields

years

verdicts

representative citing papers

citing papers explorer