pith. sign in
Pith Number

pith:IF4G3HBK

pith:2026:IF4G3HBKUCWP3KLIGOLWGIE5XE
not attested not anchored not stored refs pending

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Guangtao Zheng, Hanjie Chen, Lingxi Zhang

Embedding-based defenses in LLM multi-agent systems fail when attackers craft messages whose embeddings lie close to benign ones, but token confidence scores provide a workable alternative for pruning suspicious messages.

arxiv:2605.01133 v2 · 2026-05-01 · cs.CR · cs.LG · cs.MA

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IF4G3HBKUCWP3KLIGOLWGIE5XE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Existing embedding-based defenses aim to detect and prune suspicious agents, but their effectiveness depends on a clear separation between the text embeddings of malicious and benign messages. Attackers can circumvent such defenses by crafting messages whose embeddings lie close to benign ones. We propose using confidence scores to prune or down-weight messages during MAS communication. Experiments show improved robustness across models, datasets, and communication topologies.

C2weakest assumption

That token-level confidence signals such as logits remain informative and separable when text embeddings are no longer distinguishable under the proposed attacks.

C3one line summary

Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of suspicious messages.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-06-23T01:13:05.318866Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0

Aliases

arxiv: 2605.01133 · arxiv_version: 2605.01133v2 · doi: 10.48550/arxiv.2605.01133 · pith_short_12: IF4G3HBKUCWP · pith_short_16: IF4G3HBKUCWP3KLI · pith_short_8: IF4G3HBK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IF4G3HBKUCWP3KLIGOLWGIE5XE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0272b13a1dd78c81064a1c140c14bc0dac2f299b4e22448b4db55ff8e20058a7",
    "cross_cats_sorted": [
      "cs.LG",
      "cs.MA"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-01T22:15:11Z",
    "title_canon_sha256": "1d8a766fb924ff97a892827ec97621ba6e97335a7d533024b79aa555ac6c5c46"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.01133",
    "kind": "arxiv",
    "version": 2
  }
}