Recognition: unknown
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
Pith reviewed 2026-05-08 10:57 UTC · model grok-4.3
The pith
A lightweight proxy model can quantify uncertainty for black-box LLMs by learning their high-quality output regions through adversarial distillation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Distribution-Aligned Adversarial Distillation uses a generation-discrimination architecture to steer a proxy model toward the high-quality regions of a black-box LLM's output distribution; the proxy then reproduces the LLM's specific responses and estimates uncertainty through evidence learning, with experiments showing that even a proxy one percent the size of the target model delivers reliable quantification.
What carries the argument
The generation-discrimination architecture in Distribution-Aligned Adversarial Distillation, which aligns the proxy to high-quality output regions so it can reproduce responses and estimate uncertainty via evidence learning.
If this is right
- Uncertainty can be computed in real time for any API-only LLM without internal access or repeated sampling.
- Small proxy models become sufficient to flag when the large model is likely to produce incorrect or fabricated output.
- Commercial systems gain a practical way to add reliability checks before presenting LLM answers to users.
- Resource use drops sharply compared with sampling-based uncertainty methods while retaining comparable detection power.
Where Pith is reading between the lines
- The same distillation pattern could be tested on other black-box generators such as image or code models to see whether uncertainty transfer holds beyond text.
- Combining the proxy's uncertainty signal with downstream verification steps might reduce error propagation in chained reasoning pipelines.
- If the proxy's learned distribution remains stable across model updates, it could serve as a lightweight monitor that stays useful even when the underlying LLM is retrained or replaced.
Load-bearing premise
The adversarial training successfully confines the proxy to high-quality regions of the black-box LLM's output distribution instead of its full, noisy behavior.
What would settle it
If uncertainty scores from the trained proxy show no correlation with actual error rates or hallucination frequency on held-out queries answered by the original LLM, the method fails to deliver reliable estimates.
Figures
read the original abstract
Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To address this issue, we propose Distribution-Aligned Adversarial Distillation (DisAAD), which introduces a generation-discrimination architecture to guide a lightweight proxy model to learn the high-quality regions of the output distribution of the black-box LLM, thus effectively endowing it with the ability to know whether the black-box LLM knows or not. Subsequently, we use the proxy model to reproduce the specific responses of the black-box LLM and estimate the corresponding uncertainty based on evidence learning. Extensive experiments have verified the effectiveness and promise of our proposed method, indicating that a proxy model even one that only accounts for 1\% of the target LLM's size can achieve reliable uncertainty quantification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Distribution-Aligned Adversarial Distillation (DisAAD), which uses a generation-discrimination architecture to train a lightweight proxy model (1% the size of the target) to learn high-quality regions of a black-box LLM's output distribution. The proxy then reproduces LLM responses and estimates uncertainty via evidence learning, with the central claim that this yields reliable uncertainty quantification for black-box LLMs without multiple sampling or internal access.
Significance. If the transfer of uncertainty properties holds, the approach would enable efficient, real-time uncertainty estimation for commercial black-box LLMs, addressing hallucination risks in a scalable way that existing sampling-based or white-box methods cannot. This has clear practical value for deployment.
major comments (2)
- Abstract: The claim that 'extensive experiments have verified the effectiveness' and that a 1%-sized proxy achieves 'reliable uncertainty quantification' is load-bearing, yet the abstract (and by extension the reported results) provides no datasets, baselines, metrics (e.g., calibration curves, token-level KL divergence, or correlation between proxy evidence scores and LLM sampling entropy), or controls. Without these, it is impossible to confirm that the proxy's uncertainty matches the black-box LLM's epistemic uncertainty rather than reflecting the proxy's own training artifacts.
- Method description: The generation-discrimination architecture is asserted to 'guide the lightweight proxy to learn the high-quality regions' and thereby endow it with the ability to 'know whether the black-box LLM knows or not,' but no direct evidence is supplied that surface response matching plus discrimination transfers the relevant distributional properties for uncertainty. This assumption is central to the claim that the proxy faithfully estimates the target LLM's uncertainty.
minor comments (1)
- Abstract: The acronym 'DisAAD' is used before its expansion; ensure the full name appears on first use.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the practical significance of our work and for the constructive major comments. We address each point below and have prepared revisions to improve the clarity and evidentiary support in the manuscript.
read point-by-point responses
-
Referee: Abstract: The claim that 'extensive experiments have verified the effectiveness' and that a 1%-sized proxy achieves 'reliable uncertainty quantification' is load-bearing, yet the abstract (and by extension the reported results) provides no datasets, baselines, metrics (e.g., calibration curves, token-level KL divergence, or correlation between proxy evidence scores and LLM sampling entropy), or controls. Without these, it is impossible to confirm that the proxy's uncertainty matches the black-box LLM's epistemic uncertainty rather than reflecting the proxy's own training artifacts.
Authors: We agree that the abstract, due to space constraints, omits specific experimental details. The full paper details evaluations on multiple QA benchmarks, comparisons to baselines including temperature sampling and other uncertainty methods, and reports metrics such as uncertainty calibration error and correlation coefficients between proxy evidence and LLM output entropy. To make the abstract self-contained, we will revise it to include a brief mention of the evaluation protocol and key quantitative results demonstrating the proxy's alignment with the target LLM's uncertainty. revision: yes
-
Referee: Method description: The generation-discrimination architecture is asserted to 'guide the lightweight proxy to learn the high-quality regions' and thereby endow it with the ability to 'know whether the black-box LLM knows or not,' but no direct evidence is supplied that surface response matching plus discrimination transfers the relevant distributional properties for uncertainty. This assumption is central to the claim that the proxy faithfully estimates the target LLM's uncertainty.
Authors: The paper supports this through the adversarial training objective, which explicitly aligns distributions beyond surface matching. Evidence is provided via empirical results where the proxy reproduces LLM responses with high fidelity and its uncertainty estimates (via evidence learning) show strong agreement with direct sampling from the black-box model. We acknowledge that more direct distributional comparisons could be beneficial. In the revision, we will include additional analysis, such as KL divergence measurements on held-out responses and ablations isolating the discrimination component's contribution to uncertainty transfer. revision: partial
Circularity Check
No significant circularity; proposal is a new empirical method without self-referential reduction
full rationale
The paper proposes Distribution-Aligned Adversarial Distillation (DisAAD) as a new architecture that trains a lightweight proxy via generation-discrimination to approximate high-quality regions of a black-box LLM's output distribution, followed by evidence-based uncertainty estimation on reproduced responses. This is presented as an empirical method whose effectiveness is verified by experiments rather than a closed mathematical derivation. No equations or steps are shown that reduce the proxy uncertainty scores to the training inputs by construction, no load-bearing self-citations are invoked to justify uniqueness or ansatzes, and no fitted parameters are relabeled as independent predictions. The central result (reliable uncertainty from a 1% proxy) is therefore not tautological with the method's own definitions or prior self-citations; it remains an externally testable claim.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard machine learning assumptions on distribution alignment and adversarial training convergence
invented entities (1)
-
DisAAD generation-discrimination architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Varun Chandola, Arindam Banerjee, and Vipin Kumar
The internal state of an LLM knows when it's lying , author=. arXiv preprint arXiv:2304.13734 , year=
-
[2]
Advances in Neural Information Processing Systems , volume=
Large language models must be taught to know what they don’t know , author=. Advances in Neural Information Processing Systems , volume=
-
[4]
Nature , volume=
Larger and more instructable language models become less reliable , author=. Nature , volume=
-
[5]
Nature Machine Intelligence , volume=
What large language models know and what people think they know , author=. Nature Machine Intelligence , volume=
-
[6]
Findings of the Association for Computational Linguistics: EMNLP , pages=
Towards mitigating LLM hallucination via self reflection , author=. Findings of the Association for Computational Linguistics: EMNLP , pages=
-
[7]
Banerjee, Sourav and Agarwal, Ayushi and Singla, Saloni , journal=
-
[8]
JAMA Network Open , volume=
Accuracy, consistency, and hallucination of large language models when analyzing unstructured clinical notes in electronic medical records , author=. JAMA Network Open , volume=. 2024 , publisher=
2024
-
[9]
Estimating
Ma, Huan and Chen, Jingdong and Wang, Guangyu and Zhang, Changqing , year=. Estimating
-
[10]
Efficient and effective uncertainty quantification for
Xiong, Miao and Santilli, Andrea and Kirchhof, Michael and Golinski, Adam and Williamson, Sinead , booktitle=. Efficient and effective uncertainty quantification for
-
[11]
arXiv preprint arXiv:2408.09172 , year=
Unc-ttp: A method for classifying llm uncertainty to improve in-context example selection , author=. arXiv preprint arXiv:2408.09172 , year=
-
[12]
Nature , volume=
Detecting hallucinations in large language models using semantic entropy , author=. Nature , volume=
-
[13]
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Hannaneh Hajishirzi, and Daniel Khashabi
When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories , author=. arXiv preprint arXiv:2212.10511 , volume=
-
[14]
Zhang, Jiaxin and Li, Zhuohang and Das, Kamalika and Malin, Bradley A and Kumar, Sricharan , journal=
-
[15]
arXiv preprint arXiv:2404.10136 , year=
Language model cascades: Token-level uncertainty and beyond , author=. arXiv preprint arXiv:2404.10136 , year=
-
[18]
International conference on machine learning , pages=
Zero-shot knowledge distillation from a decision-based black-box model , author=. International conference on machine learning , pages=. 2021 , organization=
2021
-
[19]
Zeng, Cong and Tang, Shengkun and Yang, Xianjun and Chen, Yuanzhou and Sun, Yiyou and Xu, Zhiqiang and Li, Yao and Chen, Haifeng and Cheng, Wei and Xu, Dongkuan DK , journal=
-
[20]
Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu and others , journal=. Lo
-
[21]
Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil , journal=
-
[22]
Lin, Stephanie and Hilton, Jacob and Evans, Owain , journal=
-
[23]
An overview of the
Tsatsaronis, George and Balikas, Georgios and Malakasiotis, Prodromos and Partalas, Ioannis and Zschunke, Matthias and Alvers, Michael R and Weissenborn, Dirk and Krithara, Anastasia and Petridis, Sergios and Polychronopoulos, Dimitris and others , journal=. An overview of the. 2015 , publisher=
2015
-
[25]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Enhancing uncertainty modeling with semantic graph for hallucination detection , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[26]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Attributive reasoning for hallucination diagnosis of large language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[27]
Hallucinations in
Perkovi. Hallucinations in. MIPRO ICT and electronics convention (MIPRO) , pages=. 2024 , organization=
2024
-
[28]
A survey of uncertainty estimation in
Huang, Hsiu-Yuan and Yang, Yutong and Zhang, Zhaoxi and Lee, Sanwoo and Wu, Yunfang , journal=. A survey of uncertainty estimation in
-
[31]
Information Fusion , volume=
A review of uncertainty quantification in deep learning: Techniques, applications and challenges , author=. Information Fusion , volume=
-
[32]
Advances in Neural Information Processing Systems , volume=
Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in Neural Information Processing Systems , volume=
-
[33]
Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for
Wang, Shenzhi and Yu, Le and Gao, Chang and Zheng, Chujie and Liu, Shixuan and Lu, Rui and Dang, Kai and Chen, Xionghui and Yang, Jianxin and Zhang, Zhenru and others , journal=. Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for
-
[34]
arXiv preprint arXiv:2405.01470 , year=
Wildchat: 1m chatgpt interaction logs in the wild , author=. arXiv preprint arXiv:2405.01470 , year=
-
[35]
Joshi, Mandar and Choi, Eunsol and Weld, Daniel S and Zettlemoyer, Luke , journal=
-
[37]
ACM Transactions on Information Systems , volume=
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=
-
[38]
Advances in Neural Information Processing Systems , volume=
Evidential deep learning to quantify classification uncertainty , author=. Advances in Neural Information Processing Systems , volume=
-
[39]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Trusted multi-view classification with dynamic evidential fusion , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2022 , publisher=
2022
-
[40]
The Law of Knowledge Overshadowing: Towards Understanding, Predicting and Preventing
Zhang, Yuji and Li, Sha and Qian, Cheng and Liu, Jiateng and Yu, Pengfei and Han, Chi and Fung, Yi R and McKeown, Kathleen and Zhai, Chengxiang and Li, Manling and Heng, Ji , booktitle =. The Law of Knowledge Overshadowing: Towards Understanding, Predicting and Preventing. 2025 , pages =
2025
-
[42]
Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and others , journal=
-
[43]
An Yang and Anfeng Li and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Gao and Chengen Huang and Chenxu Lv and Chujie Zheng and Dayiheng Liu and Fan Zhou and Fei Huang and Feng Hu and Hao Ge and Haoran Wei and Huan Lin and Jialong Tang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and...
-
[44]
Advances in Neural Information Processing Systems , volume=
Abbasi Yadkori, Yasin and Kuzborskij, Ilja and Gy. Advances in Neural Information Processing Systems , volume=
-
[45]
Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation , author=. arXiv preprint arXiv:2302.09664 , year=
work page internal anchor Pith review arXiv
-
[46]
Transactions of the Association for Computational Linguistics , volume=
Unsupervised quality estimation for neural machine translation , author=. Transactions of the Association for Computational Linguistics , volume=
-
[47]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
Enhancing uncertainty-based hallucination detection with stronger focus , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
2023
-
[48]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics , pages=
Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics , pages=
-
[50]
Yasin Abbasi Yadkori, Ilja Kuzborskij, Andr \'a s Gy \"o rgy, and Csaba Szepesvari. 2024. To believe or not to believe your LLM: Iterative prompting for estimating epistemic uncertainty . Advances in Neural Information Processing Systems, 37:58077--58117
2024
-
[51]
Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, and 1 others. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243--297
2021
-
[52]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. GPT-4 Technical Report . arXiv preprint arXiv:2303.08774
work page internal anchor Pith review arXiv 2023
- [53]
-
[54]
Kedi Chen, Qin Chen, Jie Zhou, Xinqi Tao, Bowen Ding, Jingwen Xie, Mingchen Xie, Peilong Li, and Zheng Feng. 2025. Enhancing uncertainty modeling with semantic graph for hallucination detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23586--23594
2025
-
[55]
Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. 2024. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 5050--5063
2024
-
[56]
Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, and 1 others. 2024. Fact-checking the output of large language models via token-level uncertainty quantification. arXiv preprint arXiv:2403.04696
-
[57]
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. 2024. Detecting hallucinations in large language models using semantic entropy. Nature, 630(8017):625--630
2024
-
[58]
Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Fr \'e d \'e ric Blain, Francisco Guzm \'a n, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, and Lucia Specia. 2020. Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8:539--555
2020
-
[59]
Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2022. Trusted multi-view classification with dynamic evidential fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):2551--2566
2022
-
[60]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lo RA : Low-rank adaptation of large language models. International Conference on Learning Representations, 1(2):3
2022
- [61]
-
[62]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and 1 others. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1--55
2025
-
[63]
Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. 2017. TriviaQA : A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551
work page internal anchor Pith review arXiv 2017
-
[64]
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, and 1 others. 2022. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221
work page internal anchor Pith review arXiv 2022
-
[65]
Sanyam Kapoor, Nate Gruver, Manley Roberts, Katie Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, and Andrew G Wilson. 2024. Large language models must be taught to know what they don’t know. Advances in Neural Information Processing Systems, 37:85932--85972
2024
-
[66]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30
2017
-
[67]
Stephanie Lin, Jacob Hilton, and Owain Evans. 2021. TruthfulQA : Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958
work page internal anchor Pith review arXiv 2021
- [68]
- [69]
- [70]
-
[71]
Gabrijela Perkovi \'c , Antun Drobnjak, and Ivica Boti c ki. 2024. Hallucinations in LLMs : Understanding and addressing challenges. In MIPRO ICT and electronics convention (MIPRO), pages 2084--2088. IEEE
2024
- [72]
-
[73]
Murat Sensoy, Lance Kaplan, and Melih Kandemir. 2018. Evidential deep learning to quantify classification uncertainty. Advances in Neural Information Processing Systems, 31
2018
-
[74]
Savyasachi V Shah. 2024. Accuracy, consistency, and hallucination of large language models when analyzing unstructured clinical notes in electronic medical records. JAMA Network Open, 7(8):e2425953--e2425953
2024
-
[75]
Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, and Soheil Feizi. 2024. LLM -check: Investigating detection of hallucinations in large language models. Advances in Neural Information Processing Systems, 37:34188--34216
2024
-
[76]
Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W Mayer, and Padhraic Smyth. 2025. What large language models know and what people think they know. Nature Machine Intelligence, 7(2):221--231
2025
- [77]
- [78]
-
[79]
George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, and 1 others. 2015. An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):138
2015
- [80]
-
[81]
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, and 1 others. 2025. Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for LLM reasoning. arXiv preprint arXiv:2506.01939
work page internal anchor Pith review arXiv 2025
-
[82]
Miao Xiong, Andrea Santilli, Michael Kirchhof, Adam Golinski, and Sinead Williamson. 2024. Efficient and effective uncertainty quantification for LLMs . In Neurips Safe Generative AI Workshop
2024
-
[83]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. 2025. Qwen3 Technical Report . arXiv preprint arXiv:2505.09388
work page internal anchor Pith review arXiv 2025
-
[84]
Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, Zhiqiang Xu, Yao Li, Haifeng Chen, Wei Cheng, and Dongkuan DK Xu. 2024. DALD : Improving logits-based detector without logits from black-box LLMs . Advances in Neural Information Processing Systems, 37:54947--54973
2024
-
[85]
Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, and Luoyi Fu. 2023. Enhancing uncertainty-based hallucination detection with stronger focus. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 915--932
2023
-
[86]
Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R Fung, Kathleen McKeown, Chengxiang Zhai, Manling Li, and Ji Heng. 2025. The law of knowledge overshadowing: Towards understanding, predicting and preventing LLM hallucination. In Findings of the Association for Computational Linguistics: ACL, pages 23340--23358
2025
-
[87]
Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. 2024. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372
work page internal anchor Pith review arXiv 2024
-
[88]
Lexin Zhou, Wout Schellaert, Fernando Mart \' nez-Plumed, Yael Moros-Daval, C \`e sar Ferri, and Jos \'e Hern \'a ndez-Orallo. 2024. Larger and more instructable language models become less reliable. Nature, 634(8032):61--68
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.