pith. sign in
Pith Number

pith:ZSY4N6MW

pith:2026:ZSY4N6MWW75WIJTWVIJZYOQYTV
not attested not anchored not stored refs resolved

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

Buyun Liang, Darshan Thaker, Fengrui Tian, Hamed Hassani, Jinqi Luo, Kaleab A. Kinfu, Kwan Ho Ryan Chan, Liangzu Peng, Ren\'e Vidal

Optimizing continuous combinations of input-dependent latent editing directions produces realistic adversarial prompts that elicit hallucinations in large language models, including reasoning models where prior realistic attacks fail.

arxiv:2605.12813 v1 · 2026-05-12 · cs.CL · cs.AI · cs.CR · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZSY4N6MWW75WIJTWVIJZYOQYTV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

REALISTA achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs and, crucially, succeeds in attacking large reasoning models under free-form response settings, where prior realistic attacks fail.

C2weakest assumption

That continuous combinations of the input-dependent editing directions in latent space will decode to prompts that remain semantically equivalent and coherent rephrasings of the original benign prompt.

C3one line summary

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

References

287 extracted · 287 resolved · 44 Pith anchors

[1] Liang, Buyun and Peng, Liangzu and Luo, Jinqi and Thaker, Darshan and Chan, Kwan Ho Ryan and Vidal, Rene , editor =. Advances in. 2025 , pages = 2025
[2] Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , booktitle =
[3] Advances in Neural Information Processing Systems , author = 2024
[4] Transactions on Machine Learning Research , author =
[5] Frontiers in Artificial Intelligence , author = 2025 · doi:10.3389/frai.2025.1622292
Receipt and verification
First computed 2026-05-18T03:09:12.386151Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ccb1c6f996b7fb642676aa139c3a189d5e068162de71696ff0320bac65b18213

Aliases

arxiv: 2605.12813 · arxiv_version: 2605.12813v1 · doi: 10.48550/arxiv.2605.12813 · pith_short_12: ZSY4N6MWW75W · pith_short_16: ZSY4N6MWW75WIJTW · pith_short_8: ZSY4N6MW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZSY4N6MWW75WIJTWVIJZYOQYTV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ccb1c6f996b7fb642676aa139c3a189d5e068162de71696ff0320bac65b18213
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "339fba31b1998b97b89bc18eabcf085c1fbc2053ddf6649336771a1cd5d8cab8",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CR",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-12T23:13:50Z",
    "title_canon_sha256": "54b38ced34ede93225965606699cea7fb68549cbe10a2f636fb363631e092389"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12813",
    "kind": "arxiv",
    "version": 1
  }
}