pith:IF4G3HBK
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
Embedding-based defenses in LLM multi-agent systems fail when attackers craft messages whose embeddings lie close to benign ones, but token confidence scores provide a workable alternative for pruning suspicious messages.
arxiv:2605.01133 v2 · 2026-05-01 · cs.CR · cs.LG · cs.MA
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IF4G3HBKUCWP3KLIGOLWGIE5XE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Existing embedding-based defenses aim to detect and prune suspicious agents, but their effectiveness depends on a clear separation between the text embeddings of malicious and benign messages. Attackers can circumvent such defenses by crafting messages whose embeddings lie close to benign ones. We propose using confidence scores to prune or down-weight messages during MAS communication. Experiments show improved robustness across models, datasets, and communication topologies.
That token-level confidence signals such as logits remain informative and separable when text embeddings are no longer distinguishable under the proposed attacks.
Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of suspicious messages.
Cited by
Receipt and verification
| First computed | 2026-06-23T01:13:05.318866Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IF4G3HBKUCWP3KLIGOLWGIE5XE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "0272b13a1dd78c81064a1c140c14bc0dac2f299b4e22448b4db55ff8e20058a7",
"cross_cats_sorted": [
"cs.LG",
"cs.MA"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CR",
"submitted_at": "2026-05-01T22:15:11Z",
"title_canon_sha256": "1d8a766fb924ff97a892827ec97621ba6e97335a7d533024b79aa555ac6c5c46"
},
"schema_version": "1.0",
"source": {
"id": "2605.01133",
"kind": "arxiv",
"version": 2
}
}