Pith Number

pith:IF4G3HBK

pith:2026:IF4G3HBKUCWP3KLIGOLWGIE5XE

not attested not anchored not stored refs pending

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Guangtao Zheng, Hanjie Chen, Lingxi Zhang

Embedding-based defenses in LLM multi-agent systems fail when attackers craft messages whose embeddings lie close to benign ones, but token confidence scores provide a workable alternative for pruning suspicious messages.

arxiv:2605.01133 v2 · 2026-05-01 · cs.CR · cs.LG · cs.MA

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{IF4G3HBKUCWP3KLIGOLWGIE5XE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Existing embedding-based defenses aim to detect and prune suspicious agents, but their effectiveness depends on a clear separation between the text embeddings of malicious and benign messages. Attackers can circumvent such defenses by crafting messages whose embeddings lie close to benign ones. We propose using confidence scores to prune or down-weight messages during MAS communication. Experiments show improved robustness across models, datasets, and communication topologies.

C2weakest assumption

That token-level confidence signals such as logits remain informative and separable when text embeddings are no longer distinguishable under the proposed attacks.

C3one line summary

Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of suspicious messages.

Cited by

1 paper in Pith

When Latent Agents Lie: KV-Cache Integrity in Multi-Agent LLM Collaboration

Receipt and verification

First computed	2026-06-23T01:13:05.318866Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0

Aliases

arxiv: 2605.01133 · arxiv_version: 2605.01133v2 · doi: 10.48550/arxiv.2605.01133 · pith_short_12: IF4G3HBKUCWP · pith_short_16: IF4G3HBKUCWP3KLI · pith_short_8: IF4G3HBK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/IF4G3HBKUCWP3KLIGOLWGIE5XE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 41786d9c2aa0acfda968339763209db930962be7810cd39b2b9dd216bd656cd0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "0272b13a1dd78c81064a1c140c14bc0dac2f299b4e22448b4db55ff8e20058a7",
    "cross_cats_sorted": [
      "cs.LG",
      "cs.MA"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-01T22:15:11Z",
    "title_canon_sha256": "1d8a766fb924ff97a892827ec97621ba6e97335a7d533024b79aa555ac6c5c46"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.01133",
    "kind": "arxiv",
    "version": 2
  }
}