pith. machine review for the scientific record. sign in

arxiv: 2605.12813 · v1 · submitted 2026-05-12 · 💻 cs.CL · cs.AI· cs.CR· cs.LG

Recognition: no theorem link

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

Buyun Liang, Darshan Thaker, Fengrui Tian, Hamed Hassani, Jinqi Luo, Kaleab A. Kinfu, Kwan Ho Ryan Chan, Liangzu Peng, Ren\'e Vidal

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CRcs.LG
keywords adversarial attacksLLM hallucinationslatent space optimizationrealistic promptssemantic equivalenceediting directionsconstrained optimization
0
0 comments X

The pith

Optimizing continuous combinations of input-dependent latent editing directions produces realistic adversarial prompts that elicit hallucinations in large language models, including reasoning models where prior realistic attacks fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates hallucination elicitation as finding semantically coherent adversarial prompts equivalent to benign inputs. It builds an input-dependent dictionary of editing directions in latent space, where each direction corresponds to a valid rephrasing, then optimizes continuous linear combinations of those directions. This hybrid approach aims to retain the flexibility of continuous latent search while ensuring decoded outputs stay coherent and equivalent, unlike purely discrete rephrasings or unconstrained latent attacks. A sympathetic reader would care because the method reaches large reasoning models under free-form response settings, where existing realistic attacks do not succeed.

Core claim

REALISTA constructs an input-dependent dictionary of valid editing directions in latent space, each corresponding to a semantically equivalent and coherent rephrasing of the original prompt, and optimizes continuous combinations of these directions to generate adversarial prompts that elicit LLM hallucinations while preserving realism and equivalence.

What carries the argument

Input-dependent dictionary of valid editing directions in latent space, which enables optimization of continuous combinations that decode to coherent rephrasings.

If this is right

  • The method succeeds in attacking large reasoning models under free-form response settings where prior realistic attacks fail.
  • It achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs.
  • The constrained optimization framework combines the search richness of continuous latent attacks with the semantic guarantees of discrete rephrasing attacks.
  • It provides a practical way to generate realistic adversarial prompts for testing hallucination vulnerabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent-direction approach could be applied to elicit other model failures such as biased outputs or unsafe responses.
  • Generated adversarial prompts could serve as synthetic data for training more robust LLMs against hallucinations.
  • The input-dependent dictionary construction might reveal which kinds of subtle rephrasings most reliably surface hallucination triggers in reasoning models.

Load-bearing premise

Continuous combinations of the input-dependent editing directions in latent space will decode to prompts that remain semantically equivalent and coherent rephrasings of the original benign prompt.

What would settle it

Decoding the optimized latent vectors and checking whether the resulting prompts are mostly incoherent or semantically divergent from the original input; if they are, or if they elicit no more hallucinations than discrete baselines on reasoning models, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.12813 by Buyun Liang, Darshan Thaker, Fengrui Tian, Hamed Hassani, Jinqi Luo, Kaleab A. Kinfu, Kwan Ho Ryan Chan, Liangzu Peng, Ren\'e Vidal.

Figure 1
Figure 1. Figure 1: Illustrative example of attack generation in REALISTA [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Input-dependent edit dictionary construction. We employ a concept optimization procedure to construct a set of latent concepts c (1) , . . . , c (n) conditioned on the original prompt x0 and WordNet (Miller, 1995). These concepts are assembled into an edit dictionary D, where each column corresponds to an interpretable editing direction z (i) = c (i) − z0. See §3.1 and Appendix §C for details on the… view at source ↗
Figure 3
Figure 3. Figure 3: Optimization trajectory when solv￾ing (12). At each optimization iteration, the objective value and the maximum con￾straint violation are reported as bootstrap means (10,000 resamples) computed over the MMLU subset [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Normalized word edit distance across layers between original prompts and their reconstructions. Dots indicate the best￾performing layer for each model, and shaded regions show standard deviation over 10,000 bootstrap samples. Reconstruction quality degrades as depth increases. The reconstruction quality of decoder ψ directly affects our latent-space optimization in REALISTA. To assess this, we conduct an e… view at source ↗
Figure 7
Figure 7. Figure 7: Top 50 most frequently used concepts. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Top 50-100 most frequently used concepts. 43 [PITH_FULL_IMAGE:figures/full_fig_p043_8.png] view at source ↗
read the original abstract

Large language models (LLMs) achieve strong performance across many tasks but remain vulnerable to hallucinations, motivating the need for realistic adversarial prompts that elicit such failures. We formulate hallucination elicitation as a constrained optimization problem, where the goal is to find semantically coherent adversarial prompts that are equivalent to benign user prompts. Existing methods remain limited: discrete prompt-based attacks preserve semantic equivalence and coherence but search only over a limited set of prompt variations, while continuous latent-space attacks explore a richer space but often decode into prompts that are no longer valid rephrasings. To address these limitations, we propose REALISTA, a realistic latent-space attack framework. REALISTA constructs an input-dependent dictionary of valid editing directions, each corresponding to a semantically equivalent and coherent rephrasing, and optimizes continuous combinations of these directions in latent space. This design combines the optimization flexibility of continuous attacks with the semantic realism of discrete rephrasing-based attacks. Experiments demonstrate that REALISTA achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs and, crucially, succeeds in attacking large reasoning models under free-form response settings, where prior realistic attacks fail. Code is available at https://github.com/Buyun-Liang/REALISTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes REALISTA, a latent-space adversarial attack framework to elicit hallucinations in LLMs. It constructs an input-dependent dictionary of discrete valid rephrasing directions in latent space and optimizes continuous combinations of these directions to produce semantically coherent adversarial prompts that remain equivalent to the original benign input. Experiments claim that REALISTA achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs and, crucially, succeeds against large reasoning models in free-form response settings where prior realistic attacks fail. Code is released.

Significance. If the central assumption holds, the work would meaningfully advance realistic adversarial testing of LLMs by combining the search flexibility of continuous latent methods with the semantic guarantees of discrete rephrasing attacks. The reported success on reasoning models under free-form conditions addresses a documented gap in prior realistic attacks and could inform robustness evaluation practices. Reproducibility via released code is a positive factor.

major comments (2)
  1. [§3.2] §3.2 (Method): the optimization of continuous (linear) combinations of input-dependent editing directions assumes that arbitrary convex combinations remain within the decoder's region of valid, semantically equivalent rephrasings. No analysis or bound is provided showing that the latent directions form a subspace closed under the decoder; non-linear interactions could produce drifted or incoherent outputs while still optimizing the attack objective. This assumption is load-bearing for the superiority claim on reasoning models.
  2. [§5.3] §5.3 (Experiments on reasoning models): the reported success rates lack accompanying metrics confirming that the generated prompts remain semantically equivalent to the originals (e.g., via entailment scores, human ratings, or automatic similarity thresholds). Without such checks, it is unclear whether the performance gain stems from the continuous search or from inadvertently relaxed equivalence constraints.
minor comments (2)
  1. [Figure 2] Figure 2: axis labels and legend are too small for readability; increase font size and add a caption clarifying what the plotted directions represent.
  2. [Related Work] Related Work: the distinction between REALISTA and prior continuous latent attacks (e.g., those using direct latent optimization without the dictionary constraint) could be made more explicit with a short comparison table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our method's theoretical grounding and experimental validation. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Method): the optimization of continuous (linear) combinations of input-dependent editing directions assumes that arbitrary convex combinations remain within the decoder's region of valid, semantically equivalent rephrasings. No analysis or bound is provided showing that the latent directions form a subspace closed under the decoder; non-linear interactions could produce drifted or incoherent outputs while still optimizing the attack objective. This assumption is load-bearing for the superiority claim on reasoning models.

    Authors: We agree that a formal analysis of closure under convex combinations would strengthen the presentation. Our construction derives each editing direction from a valid decoder output (i.e., a semantically equivalent rephrasing), and the optimization is performed only over directions that individually decode to coherent text. In practice, the resulting combinations remain within the valid region, as evidenced by the high semantic similarity scores we observe. In the revision we will add to §3.2 an explicit discussion of this assumption together with empirical measurements (entailment scores and cosine similarity in embedding space) across a range of combination coefficients, providing quantitative support for the validity of the interpolated prompts. revision: yes

  2. Referee: [§5.3] §5.3 (Experiments on reasoning models): the reported success rates lack accompanying metrics confirming that the generated prompts remain semantically equivalent to the originals (e.g., via entailment scores, human ratings, or automatic similarity thresholds). Without such checks, it is unclear whether the performance gain stems from the continuous search or from inadvertently relaxed equivalence constraints.

    Authors: We acknowledge that additional quantitative checks would make the equivalence claim more transparent. The original experiments relied on the fact that each basis direction is a verified valid rephrasing and on manual verification of coherence for the final outputs. For the revised version we will augment §5.3 with automatic semantic-equivalence metrics (entailment probability from a fine-tuned NLI model and sentence-level embedding similarity) computed on all adversarial prompts used in the reasoning-model experiments. These metrics will be reported alongside the attack success rates to confirm that equivalence constraints were not relaxed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper constructs an input-dependent dictionary of discrete valid rephrasing directions and optimizes continuous combinations in latent space to generate adversarial prompts. Performance claims rest on direct experimental comparisons to prior realistic attacks across open-source LLMs and large reasoning models, without any reduction of success metrics to fitted parameters, self-definitional equivalences, or load-bearing self-citations. The assumption that convex combinations decode to coherent equivalents is presented as an empirical property tested in the evaluation, not a definitional tautology or imported uniqueness result. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the constructed editing directions preserve semantic equivalence when combined continuously; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Editing directions in the input-dependent dictionary correspond to semantically equivalent and coherent rephrasings.
    This premise is required for the optimized prompts to remain realistic and is stated as the basis for the dictionary construction.

pith-pipeline@v0.9.0 · 5561 in / 1147 out tokens · 56198 ms · 2026-05-14T20:08:45.207424+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

287 extracted references · 175 canonical work pages · 38 internal anchors

  1. [1]

    Advances in

    Liang, Buyun and Peng, Liangzu and Luo, Jinqi and Thaker, Darshan and Chan, Kwan Ho Ryan and Vidal, Rene , editor =. Advances in. 2025 , pages =

  2. [2]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , booktitle =

    Croce, Francesco and Hein, Matthias , year =. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , booktitle =

  3. [3]

    Advances in Neural Information Processing Systems , author =

    Logicity:. Advances in Neural Information Processing Systems , author =. 2024 , pages =

  4. [4]

    Transactions on Machine Learning Research , author =

    Reliable and. Transactions on Machine Learning Research , author =

  5. [5]

    Frontiers in Artificial Intelligence , author =

    Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior , volume =. Frontiers in Artificial Intelligence , author =. 2025 , pages =. doi:10.3389/frai.2025.1622292 , abstract =

  6. [6]

    Peng, Baolin and Galley, Michel and He, Pengcheng and Cheng, Hao and Xie, Yujia and Hu, Yu and Huang, Qiuyuan and Liden, Lars and Yu, Zhou and Chen, Weizhu and Gao, Jianfeng , month = mar, year =. Check. doi:10.48550/arXiv.2302.12813 , abstract =

  7. [7]

    Tonmoy, S. M. Towhidul Islam and Zaman, S. M. Mehedi and Jain, Vinija and Rani, Anku and Rawte, Vipula and Chadha, Aman and Das, Amitava , month = jan, year =. A. doi:10.48550/arXiv.2401.01313 , abstract =

  8. [8]

    doi:10.48550/arXiv.2410.02707 , abstract =

    Orgad, Hadas and Toker, Michael and Gekhman, Zorik and Reichart, Roi and Szpektor, Idan and Kotek, Hadas and Belinkov, Yonatan , month = may, year =. doi:10.48550/arXiv.2410.02707 , abstract =

  9. [9]

    Training

    Ellis, Evan and Myers, Vivek and Tuyls, Jens and Levine, Sergey and Dragan, Anca and Eysenbach, Benjamin , month = oct, year =. Training. doi:10.48550/arXiv.2510.13709 , abstract =

  10. [10]

    doi:10.48550/arXiv.2603.13435 , abstract =

    Xu, Shuhan and Liang, Siyuan and Zheng, Hongling and Luo, Yong and Hu, Han and Zhang, Lefei and Tao, Dacheng , month = mar, year =. doi:10.48550/arXiv.2603.13435 , abstract =

  11. [11]

    Zhang, Qizheng and Hu, Changran and Upasani, Shubhangi and Ma, Boyuan and Hong, Fenglu and Kamanuru, Vamsidhar and Rainton, Jay and Wu, Chen and Ji, Mengmeng and Li, Hanchen and Thakker, Urmish and Zou, James and Olukotun, Kunle , month = jan, year =. Agentic. doi:10.48550/arXiv.2510.04618 , abstract =

  12. [12]

    Explaining and Harnessing Adversarial Examples

    Goodfellow, Ian J. and Shlens, Jonathon and Szegedy, Christian , month = mar, year =. Explaining and. doi:10.48550/arXiv.1412.6572 , abstract =

  13. [13]

    GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

    Agrawal, Lakshya A and Tan, Shangyin and Soylu, Dilara and Ziems, Noah and Khare, Rishi and Opsahl-Ong, Krista and Singhvi, Arnav and Shandilya, Herumb and Ryan, Michael J and Jiang, Meng and Potts, Christopher and Sen, Koushik and Dimakis, Alexandros G. and Stoica, Ion and Klein, Dan and Zaharia, Matei and Khattab, Omar , year =. doi:10.48550/ARXIV.2507....

  14. [14]

    doi:10.48550/arXiv.2505.19514 , abstract =

    Yu, Yaoning and Yu, Ye and Zhang, Peiyan and Wei, Kai and Luo, Haojing and Wang, Haohan , month = jan, year =. doi:10.48550/arXiv.2505.19514 , abstract =

  15. [15]

    Efficient Estimation of Word Representations in Vector Space

    Mikolov, Tomas and Chen, Kai and Corrado, Greg and Dean, Jeffrey , month = sep, year =. Efficient. doi:10.48550/arXiv.1301.3781 , abstract =

  16. [16]

    Distributed Representations of Words and Phrases and their Compositionality

    Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg and Dean, Jeffrey , month = oct, year =. Distributed. doi:10.48550/arXiv.1310.4546 , abstract =

  17. [17]

    Pennington, Jeffrey and Socher, Richard and Manning, Christopher , year =. Glove:. Proceedings of the 2014. doi:10.3115/v1/D14-1162 , abstract =

  18. [18]

    Shazeer, Noam and Doherty, Ryan and Evans, Colin and Waterson, Chris , month = feb, year =. Swivel:. doi:10.48550/arXiv.1602.02215 , abstract =

  19. [19]

    Deep contextualized word representations

    Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke , month = mar, year =. Deep contextualized word representations , url =. doi:10.48550/arXiv.1802.05365 , abstract =

  20. [20]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , month = may, year =. doi:10.48550/arXiv.1810.04805 , abstract =

  21. [21]

    Ethayarajh, Kawin , month = sep, year =. How. doi:10.48550/arXiv.1909.00512 , abstract =

  22. [22]

    What do you learn from context? Probing for sentence structure in contextualized word representations

    Tenney, Ian and Xia, Patrick and Chen, Berlin and Wang, Alex and Poliak, Adam and McCoy, R. Thomas and Kim, Najoung and Durme, Benjamin Van and Bowman, Samuel R. and Das, Dipanjan and Pavlick, Ellie , month = may, year =. What do you learn from context?. doi:10.48550/arXiv.1905.06316 , abstract =

  23. [23]

    Steering Language Models With Activation Engineering

    Turner, Alexander Matt and Thiergart, Lisa and Leech, Gavin and Udell, David and Vazquez, Juan J. and Mini, Ulisse and MacDiarmid, Monte , month = oct, year =. Steering. doi:10.48550/arXiv.2308.10248 , abstract =

  24. [24]

    Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =

  25. [25]

    Scaling and evaluating sparse autoencoders

    Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =

  26. [26]

    Marks, Samuel and Treutlein, Johannes and Bricken, Trenton and Lindsey, Jack and Marcus, Jonathan and Mishra-Sharma, Siddharth and Ziegler, Daniel and Ameisen, Emmanuel and Batson, Joshua and Belonax, Tim and Bowman, Samuel R. and Carter, Shan and Chen, Brian and Cunningham, Hoagy and Denison, Carson and Dietz, Florian and Golechha, Satvik and Khan, Akbir...

  27. [27]

    arXiv , url =:2411.02193 , primaryclass =

    Chalnev, Sviatoslav and Siu, Matthew and Conmy, Arthur , month = nov, year =. Improving. doi:10.48550/arXiv.2411.02193 , abstract =

  28. [28]

    Interpretable

    Soo, Samuel and Guang, Chen and Teng, Wesley and Balaganesh, Chandrasekaran and Guoxian, Tan and Ming, Yan , month = apr, year =. Interpretable. doi:10.48550/arXiv.2501.09929 , abstract =

  29. [29]

    Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas , month = may, year =. U-. doi:10.48550/arXiv.1505.04597 , abstract =

  30. [30]

    Park, Kiho and Choe, Yo Joong and Veitch, Victor , month = jul, year =. The. doi:10.48550/arXiv.2311.03658 , abstract =

  31. [31]

    and LeCun, Yann , month = apr, year =

    Yun, Zeyu and Chen, Yubei and Olshausen, Bruno A. and LeCun, Yann , month = apr, year =. Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors , shorttitle =. doi:10.48550/arXiv.2103.15949 , abstract =

  32. [32]

    Adam: A Method for Stochastic Optimization

    Kingma, Diederik P. and Ba, Jimmy , month = jan, year =. Adam:. doi:10.48550/arXiv.1412.6980 , abstract =

  33. [33]

    CVXPY: A Python-Embedded Modeling Language for Convex Optimization

    Diamond, Steven and Boyd, Stephen , month = jun, year =. doi:10.48550/arXiv.1603.00943 , abstract =

  34. [34]

    and Sudderth, Erik B

    Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar , year =. Efficient projections onto the. Proceedings of the 25th international conference on. doi:10.1145/1390156.1390191 , abstract =

  35. [35]

    Defending

    Casper, Stephen and Schulze, Lennart and Patel, Oam and Hadfield-Menell, Dylan , month = jul, year =. Defending. doi:10.48550/arXiv.2403.05030 , abstract =

  36. [36]

    Howe, Nikolaus and McKenzie, Ian and Hollinsworth, Oskar and Zajac, Michał and Tseng, Tom and Tucker, Aaron and Bacon, Pierre-Luc and Gleave, Adam , month = jun, year =. Scaling. doi:10.48550/arXiv.2407.18213 , abstract =

  37. [37]

    Sheshadri, Abhay and Ewart, Aidan and Guo, Phillip and Lynch, Aengus and Wu, Cindy and Hebbar, Vivek and Sleight, Henry and Stickland, Asa Cooper and Perez, Ethan and Hadfield-Menell, Dylan and Casper, Stephen , month = jul, year =. Latent. doi:10.48550/arXiv.2407.15549 , abstract =

  38. [38]

    and Vechev, Martin , month = oct, year =

    Dékány, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I. and Vechev, Martin , month = oct, year =. doi:10.48550/arXiv.2505.16947 , abstract =

  39. [39]

    Efficient

    Xhonneux, Sophie and Sordoni, Alessandro and Günnemann, Stephan and Gidel, Gauthier and Schwinn, Leo , month = nov, year =. Efficient. doi:10.48550/arXiv.2405.15589 , abstract =

  40. [40]

    Adversarial

    Sabbaghi, Mahdi and Kassianik, Paul and Pappas, George and Singer, Yaron and Karbasi, Amin and Hassani, Hamed , month = jun, year =. Adversarial. doi:10.48550/arXiv.2502.01633 , abstract =

  41. [41]

    Simple statistical gradient-following algorithms for connectionist reinforcement learning , volume =

    Williams, Ronald J , year =. Simple statistical gradient-following algorithms for connectionist reinforcement learning , volume =. Machine learning , publisher =

  42. [42]

    doi:10.48550/arXiv.2506.22666 , abstract =

    Lochab, Anamika and Yan, Lu and Pynadath, Patrick and Zhang, Xiangyu and Zhang, Ruqi , month = nov, year =. doi:10.48550/arXiv.2506.22666 , abstract =

  43. [43]

    Communications of the ACM , publisher =

    Miller, George A , year =. Communications of the ACM , publisher =

  44. [44]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

  45. [45]

    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing , author =

    Divide and. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing , author =

  46. [46]

    Paraphrase. Mech. Transl. Comput. Linguistics , author =

  47. [47]

    Xu, Fengli and Hao, Qianyue and Zong, Zefang and Wang, Jingwei and Zhang, Yunke and Wang, Jingyi and Lan, Xiaochong and Gong, Jiahui and Ouyang, Tianjian and Meng, Fanjin and Shao, Chenyang and Yan, Yuwei and Yang, Qinglong and Song, Yiwen and Ren, Sijian and Hu, Xinyuan and Li, Yu and Feng, Jie and Gao, Chen and Li, Yong , month = jan, year =. Towards. d...

  48. [48]

    DeepSeek-AI and Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bocha...

  49. [49]

    Safety in

    Wang, Cheng and Liu, Yue and Bi, Baolong and Zhang, Duzhen and Li, Zhong-Zhi and Ma, Yingwei and He, Yufei and Yu, Shengju and Li, Xinfeng and Fang, Junfeng and Zhang, Jiaheng and Hooi, Bryan , month = may, year =. Safety in. doi:10.48550/arXiv.2504.17704 , abstract =

  50. [50]

    doi:10.48550/arXiv.2410.02832 , abstract =

    Liu, Yue and He, Xiaoxin and Xiong, Miao and Fu, Jinlan and Deng, Shumin and Hooi, Bryan , month = oct, year =. doi:10.48550/arXiv.2410.02832 , abstract =

  51. [51]

    doi:10.48550/arXiv.2501.18492 , abstract =

    Liu, Yue and Gao, Hongcheng and Zhai, Shengfang and Xia, Jun and Wu, Tianyi and Xue, Zhiwei and Chen, Yulin and Kawaguchi, Kenji and Zhang, Jiaheng and Hooi, Bryan , month = jan, year =. doi:10.48550/arXiv.2501.18492 , abstract =

  52. [52]

    doi:10.48550/arXiv.2505.11049 , abstract =

    Liu, Yue and Zhai, Shengfang and Du, Mingzhe and Chen, Yulin and Cao, Tri and Gao, Hongcheng and Wang, Cheng and Li, Xinfeng and Wang, Kun and Fang, Junfeng and Zhang, Jiaheng and Hooi, Bryan , month = may, year =. doi:10.48550/arXiv.2505.11049 , abstract =

  53. [53]

    Luo, Jinqi and Wang, Zhaoning and Wu, Chen Henry and Huang, Dong and De La Torre, Fernando , month = jun, year =. Zero-. 2023. doi:10.1109/CVPR52729.2023.01119 , abstract =

  54. [54]

    doi:10.48550/arXiv.2111.13984 , abstract =

    Liang, Buyun and Mitchell, Tim and Sun, Ju , year =. doi:10.48550/arXiv.2111.13984 , abstract =

  55. [55]

    Constraint-handling in evolutionary optimization , isbn =

    Mezura-Montes, Efrén , year =. Constraint-handling in evolutionary optimization , isbn =

  56. [56]

    Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms , isbn =

    Bäck, Thomas , year =. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms , isbn =. doi:10.1093/oso/9780195099713.001.0001 , abstract =

  57. [57]

    and Smith, James E

    Eiben, Agoston E. and Smith, James E. , year =. Introduction to evolutionary computing , isbn =

  58. [58]

    Shadows can be

    Zhong, Yiqi and Liu, Xianming and Zhai, Deming and Jiang, Junjun and Ji, Xiangyang , month = jun, year =. Shadows can be. 2022. doi:10.1109/CVPR52688.2022.01491 , abstract =

  59. [59]

    Semantic

    Hosseini, Hossein and Poovendran, Radha , month = jun, year =. Semantic. 2018. doi:10.1109/CVPRW.2018.00212 , language =

  60. [60]

    Semantic

    Wang, Chenan and Duan, Jinhao and Xiao, Chaowei and Kim, Edward and Stamm, Matthew and Xu, Kaidi , month = sep, year =. Semantic. doi:10.48550/arXiv.2309.07398 , abstract =

  61. [61]

    Semantic

    Joshi, Ameya and Mukherjee, Amitangshu and Sarkar, Soumik and Hegde, Chinmay , month = aug, year =. Semantic. doi:10.48550/arXiv.1904.08489 , abstract =

  62. [62]

    doi:10.48550/arXiv.2311.15551 , abstract =

    Liu, Jiang and Wei, Chen and Guo, Yuxiang and Yu, Heng and Yuille, Alan and Feizi, Soheil and Lau, Chun Pong and Chellappa, Rama , month = nov, year =. doi:10.48550/arXiv.2311.15551 , abstract =

  63. [63]

    Leveraging

    Robinson, Joshua and Rytting, Christopher Michael and Wingate, David , month = mar, year =. Leveraging. doi:10.48550/arXiv.2210.12353 , abstract =

  64. [64]

    Pezeshkpour, Pouya and Hruschka, Estevam , month = aug, year =. Large. doi:10.48550/arXiv.2308.11483 , abstract =

  65. [65]

    Artifacts or

    Balepur, Nishant and Ravichander, Abhilasha and Rudinger, Rachel , month = jun, year =. Artifacts or. doi:10.48550/arXiv.2402.12483 , abstract =

  66. [66]

    Accelerating

    Zhao, Yiran and Zheng, Wenyue and Cai, Tianle and Do, Xuan Long and Kawaguchi, Kenji and Goyal, Anirudh and Shieh, Michael , month = nov, year =. Accelerating. doi:10.48550/arXiv.2403.01251 , abstract =

  67. [67]

    Machine Translation , author =

    Measuring machine translation quality as semantic equivalence:. Machine Translation , author =. 2009 , pages =. doi:10.1007/s10590-009-9060-y , abstract =

  68. [68]

    Journal of Artificial Intelligence Research , author =

    A. Journal of Artificial Intelligence Research , author =. 2010 , pages =. doi:10.1613/jair.2985 , abstract =

  69. [69]

    Nature , author =

    Detecting hallucinations in large language models using semantic entropy , volume =. Nature , author =. 2024 , pages =. doi:10.1038/s41586-024-07421-0 , abstract =

  70. [70]

    HellaSwag: Can a Machine Really Finish Your Sentence?

    Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin , month = may, year =. doi:10.48550/arXiv.1905.07830 , abstract =

  71. [71]

    Sophiavl-r1: Reinforcing mllms reasoning with thinking reward.arXiv preprint arXiv:2505.17018, 2025

    Fan, Kaixuan and Feng, Kaituo and Lyu, Haoming and Zhou, Dongzhan and Yue, Xiangyu , month = may, year =. doi:10.48550/arXiv.2505.17018 , abstract =

  72. [72]

    doi:10.48550/arXiv.2403.10949 , abstract =

    Chen, Haozhe and Vondrick, Carl and Mao, Chengzhi , month = mar, year =. doi:10.48550/arXiv.2403.10949 , abstract =

  73. [73]

    doi:10.48550/arXiv.2412.08686 , abstract =

    Pan, Alexander and Chen, Lijie and Steinhardt, Jacob , month = dec, year =. doi:10.48550/arXiv.2412.08686 , abstract =

  74. [74]

    and Chen, Danqi and Arora, Sanjeev , month = jan, year =

    Malladi, Sadhika and Gao, Tianyu and Nichani, Eshaan and Damian, Alex and Lee, Jason D. and Chen, Danqi and Arora, Sanjeev , month = jan, year =. Fine-. doi:10.48550/arXiv.2305.17333 , abstract =

  75. [75]

    doi:10.48550/arXiv.2505.10838 , abstract =

    Li, Ran and Wang, Hao and Mao, Chengzhi , month = may, year =. doi:10.48550/arXiv.2505.10838 , abstract =

  76. [76]

    Obfuscated

    Bailey, Luke and Serrano, Alex and Sheshadri, Abhay and Seleznyov, Mikhail and Taylor, Jordan and Jenner, Erik and Hilton, Jacob and Casper, Stephen and Guestrin, Carlos and Emmons, Scott , month = feb, year =. Obfuscated. doi:10.48550/arXiv.2412.09565 , abstract =

  77. [77]

    Liu, Junteng and Chen, Shiqi and Cheng, Yu and He, Junxian , year =. On the. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.1012 , abstract =

  78. [78]

    Marks, Samuel and Tegmark, Max , month = aug, year =. The. doi:10.48550/arXiv.2310.06824 , abstract =

  79. [79]

    Park, Seongheon and Du, Xuefeng and Yeh, Min-Hsuan and Wang, Haobo and Li, Yixuan , month = may, year =. Steer. doi:10.48550/arXiv.2503.01917 , abstract =

  80. [80]

    Bridge the

    Yuan, Lifan and Zhang, Yichi and Chen, Yangyi and Wei, Wei , month = jun, year =. Bridge the. doi:10.48550/arXiv.2110.15317 , abstract =

Showing first 80 references.