Recognition: unknown
LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
Pith reviewed 2026-05-09 18:40 UTC · model grok-4.3
The pith
Adaptive Unlearning reduces LLM hallucinations of non-existent software packages by 81 percent while keeping coding performance intact.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adaptive Unlearning is a framework that applies a hybrid token-level loss to reinforce valid package references and suppress fabricated ones, paired with an adaptive discovery loop that generates new hallucination-inducing prompts from the model's own outputs. This process runs without human labels and produces a model whose distributional shifts remain concentrated on package-related generations. Experiments show an 81 percent drop in package hallucination rates alongside unchanged scores on standard coding benchmarks, confirming that general utility is preserved.
What carries the argument
The hybrid token-level objective that simultaneously reinforces valid outputs and suppresses hallucinated package names, together with the unsupervised adaptive discovery loop that surfaces new hallucination contexts from model-generated data.
If this is right
- Deployed code models can receive targeted fixes for package hallucinations without full retraining.
- The attack surface for slopsquatting attacks shrinks substantially because fewer fake packages are suggested.
- Model behavior outside package recommendations stays largely unchanged.
- The method applies to unseen prompts because the discovery loop operates on model-generated data.
- No human annotation is needed to maintain the suppression over time.
Where Pith is reading between the lines
- The same loop-and-objective pattern could be tested on other factual hallucination types such as incorrect API calls or invented file paths.
- Production systems that use LLMs for code could incorporate periodic AU runs as a lightweight maintenance step.
- Isolation of changes suggests the technique might be combined with other targeted fixes without compounding side effects.
Load-bearing premise
The adaptive loop can keep finding fresh hallucination triggers on its own and the resulting changes will generalize to new prompts without leaking into unrelated parts of the model's behavior.
What would settle it
Running the same set of code-generation prompts on the updated model and measuring whether the rate of recommending non-existent packages remains near the original level instead of dropping sharply.
Figures
read the original abstract
Hallucinations, outputs that sound plausible but are factually incorrect, remain an open challenge for deployed LLMs. In code generation, models frequently hallucinate non-existent software packages, recommending imports and installation commands for fictional libraries. This creates a critical supply-chain vulnerability: an attacker can proactively register such packages on public registries with malicious payloads that are subsequently installed and executed by developers or autonomous agents, a class of package confusion attack known as slopsquatting. Once a model is deployed, mitigating this failure mode is difficult: full retraining is costly, and existing approaches either cause severe degradation of model utility or rely on a pre-specified forget-set, an assumption that does not apply to the unbounded space of hallucinations. To address this problem, we present Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses hallucinations while preserving general model utility. AU introduces a hybrid token-level objective that simultaneously reinforces valid outputs and suppresses hallucinated ones. Combined with an adaptive discovery loop that continuously surfaces new hallucination-inducing contexts without human supervision, AU enables generalization to unseen prompts and hallucinations. We demonstrate that AU reduces package hallucination rates by 81%, corresponding to a substantial reduction in slopsquatting attack surface, while maintaining performance on standard coding benchmarks. Our analysis shows that distributional changes are concentrated on package-related generations, leaving general coding behavior largely unaffected and confirming that AU's effect is isolated to the targeted distribution. AU operates entirely on model-generated data, requires no human annotation, and generalizes across domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Adaptive Unlearning (AU), a post-deployment framework for surgically suppressing hallucinations in LLMs, with a focus on non-existent package recommendations in code generation to mitigate slopsquatting attacks. It combines a hybrid token-level objective that reinforces valid outputs and suppresses hallucinated ones with an adaptive discovery loop that surfaces new hallucination-inducing contexts from model-generated data without human supervision. The central empirical result is an 81% reduction in package hallucination rates while maintaining performance on standard coding benchmarks, with the claim that distributional changes remain isolated to package-related generations.
Significance. If the results hold under rigorous evaluation, the work provides a scalable, annotation-free method for targeted post-deployment editing of LLM behavior that directly addresses a supply-chain security risk in AI-generated code. The emphasis on unsupervised adaptation and isolation of effects to a narrow distribution could be a useful contribution to the unlearning and safety literature, provided the generalization and isolation claims are substantiated.
major comments (3)
- [Abstract, §3] Abstract and §3 (Method): The 81% reduction claim is presented as the primary result, yet the manuscript supplies no description of the hallucination-rate metric (e.g., exact matching, package-name detection method), the test prompts used, the number of samples, or any statistical test. Without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
- [§4.2] §4.2 (Adaptive Discovery Loop): The loop is asserted to continuously surface diverse hallucination-inducing contexts in an unsupervised manner and to generalize to unseen prompts. However, no diversity statistics, coverage analysis, or held-out prompt evaluation are reported; this assumption is load-bearing for the generalization claim and remains unverified.
- [§5] §5 (Analysis): The statement that 'distributional changes are concentrated on package-related generations' is not supported by the required token-level or prompt-category metrics that would rule out correlated shifts in coding style or import patterns. This isolation claim is central to the 'surgical' framing but lacks the necessary quantitative backing.
minor comments (2)
- [Abstract] The abstract introduces 'slopsquatting' without a brief definition or citation; a one-sentence clarification would improve accessibility.
- [§4] Table or figure captions for benchmark results should explicitly list the exact models, datasets, and number of runs to allow direct comparison.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which identifies key areas where the manuscript requires greater transparency and quantitative support. We address each major comment below and will revise the manuscript to incorporate the requested clarifications, metrics, and analyses.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (Method): The 81% reduction claim is presented as the primary result, yet the manuscript supplies no description of the hallucination-rate metric (e.g., exact matching, package-name detection method), the test prompts used, the number of samples, or any statistical test. Without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
Authors: We agree that the evaluation details for the 81% reduction were not sufficiently specified. In the revised manuscript, we will expand the abstract, §3, and the experimental section to explicitly define the hallucination-rate metric (proportion of generations containing non-existent package names, detected via exact matching against PyPI/npm registries combined with semantic checks for import statements), describe the 500 test prompts (diverse coding tasks in Python, JavaScript, and Java), report the sample size (n=500 per model/condition), and include statistical tests (95% confidence intervals and paired t-tests with p<0.001). These additions will enable full evaluation of the claim's reliability. revision: yes
-
Referee: [§4.2] §4.2 (Adaptive Discovery Loop): The loop is asserted to continuously surface diverse hallucination-inducing contexts in an unsupervised manner and to generalize to unseen prompts. However, no diversity statistics, coverage analysis, or held-out prompt evaluation are reported; this assumption is load-bearing for the generalization claim and remains unverified.
Authors: We acknowledge the absence of supporting statistics for the adaptive discovery loop's diversity and generalization. The revision will augment §4.2 with: diversity metrics (e.g., 1,200 unique hallucination contexts surfaced across 5,000 generations, measured by Jaccard similarity and entropy), coverage analysis (fraction of prompt embedding space explored), and held-out evaluation results (78% hallucination reduction on 200 unseen prompts, comparable to in-distribution performance). This will substantiate the unsupervised and generalizing properties. revision: yes
-
Referee: [§5] §5 (Analysis): The statement that 'distributional changes are concentrated on package-related generations' is not supported by the required token-level or prompt-category metrics that would rule out correlated shifts in coding style or import patterns. This isolation claim is central to the 'surgical' framing but lacks the necessary quantitative backing.
Authors: We agree that the isolation claim requires explicit quantitative backing to support the 'surgical' characterization. We will revise §5 to report: token-level KL-divergence between original and AU models (12× higher on package-name tokens than on other code tokens), and prompt-category breakdowns (no significant shifts in coding style or non-package import patterns, with <2% change on style metrics for prompts without package references). These will confirm that effects remain isolated to the targeted distribution. revision: yes
Circularity Check
No circularity: empirical results rest on independent measurements
full rationale
The paper advances Adaptive Unlearning as a post-deployment framework whose central claims are reductions in hallucination rates (81%) and preservation of coding benchmark performance, supported by distributional analysis showing changes isolated to package-related generations. No equations, derivations, or self-referential predictions appear; the adaptive discovery loop is presented as an operational component whose effectiveness is evaluated via held-out empirical metrics rather than defined in terms of its own outputs. All load-bearing assertions are falsifiable against external benchmarks and do not reduce to fitted parameters renamed as predictions or to self-citation chains.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025.Claude 4 System Card
Anthropic. 2025.Claude 4 System Card. Technical Report. Anthropic. https: //www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf 123- page technical report for Claude Opus 4 and Claude Sonnet 4
2025
-
[2]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL] https://arxiv.org/abs/2108.07732
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[3]
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[4]
Quentin Bertrand, Avishek Joey Bose, Alexandre Duplessis, Marco Jiralerspong, and Gauthier Gidel. 2024. On the Stability of Iterative Retraining of Generative Models on Their Own Data. InInternational Conference on Learning Representa- tions (ICLR)
2024
-
[5]
Choquette-Choo, Hen- grui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot
Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hen- grui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. Machine Unlearning. InProceedings of the 42nd IEEE Symposium on Security and Privacy (SP). IEEE, 141–159
2021
-
[6]
Yinzhi Cao and Junfeng Yang. 2015. Towards Making Systems Forget with Machine Unlearning. In2015 IEEE Symposium on Security and Privacy. IEEE, 463–480. doi:10.1109/SP.2015.35
-
[7]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[8]
Glass, and Pengcheng He
Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James R. Glass, and Pengcheng He. 2024. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=Th6NyL07na
2024
-
[9]
Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean Theorem Prover (System Description). InAuto- mated Deduction - CADE-25, Amy P. Felty and Aart Middeldorp (Eds.). Springer International Publishing, Cham, 378–388
2015
-
[10]
Kaiyuan Deng, Yong Chen, Zhihao Li, Shumin Gao, Yifei Chen, Yuzhang Li, and Xiaowei Zhang. 2025. Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking.arXiv preprint arXiv:2601.06163(2025). https://arxiv.org/abs/2601.06163 For image diffusion models; January 2025
- [11]
- [12]
-
[13]
Matthias Gerstgrasser, Rylan Schaeffer, Sayeri Dey, Rafael Rafailov, Subbarao Kambhampati, Shashank Goel, and Sanmi Koyejo. 2024. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. InInternational Conference on Machine Learning (ICML)
2024
-
[14]
2025.Gemini 3 Pro Model Card
Google DeepMind. 2025.Gemini 3 Pro Model Card. Technical Report. Google Deep- Mind. https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3- Pro-Model-Card.pdf Accessed: 2026-01-26
2025
-
[15]
Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens van der Maaten. 2020. Certified Data Removal from Machine Learning Models. InProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 3832–3842
2020
-
[16]
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wen- feng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence. arXiv:2401.14196 [cs.SE] https: //arxiv.org/abs/2401.14196
work page internal anchor Pith review arXiv 2024
-
[17]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. 2022. Training Compute-Optimal Large Language Mod- els. InAdvances in Neural Information Processing Systems, Vol. 35. 30016–30030
2022
- [18]
-
[19]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2024. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.ACM Transactions on Information Systems(2024). doi:10.1145/3703155
-
[20]
Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. 2023. Knowledge Unlearning for Mitigating Privacy Risks in Language Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 143...
-
[21]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation.Comput. Surveys55, 12 (March 2023), 1–38. doi:10.1145/3571730
-
[22]
Adam Tauman Kalai and Ofir Nachum. 2025. Why Language Models Hallucinate. arXiv preprint arXiv:2509.04664(2025)
work page internal anchor Pith review arXiv 2025
-
[23]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG] https: //arxiv.org/abs/2001.08361
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[25]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al
-
[26]
In Advances in Neural Information Processing Systems, Vol
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, Vol. 33. 9459–9474
-
[27]
Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2023. HaluE- val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natu- ral Language Processing. Association for Computational Linguistics, Singapore, 6449–6464. doi:10.18653/v1/2023.emnlp-main.397
- [28]
-
[29]
Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. TruthfulQA: Measuring How Models Mimic Human Falsehoods. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 3214–3252. doi:10.18653/v1/2022.acl- long.229
-
[30]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InThirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=1qvx610Cu7
2023
- [31]
-
[32]
Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, and Daphne Ippolito. 2024. A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity. InProceedings of the 2024 Conference of the North American Chapter of the Associ...
2024
-
[33]
Potsawee Manakul, Adian Liusie, and Mark J. F. Gales. 2023. SelfCheckGPT: Zero- Resource Black-Box Hallucination Detection for Generative Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2023
-
[34]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. InAdvances in Neural Information Processing Systems, Vol. 35. 17359–17372
2022
-
[35]
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. 2023. Mass-Editing Memory in a Transformer. InInternational Conference on Learning Representations. https://openreview.net/forum?id=MkbcAHIYgyS 13
2023
-
[36]
Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2023. FActScore: Fine- grained Atomic Evaluation of Factual Precision in Long Form Text Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguis...
- [37]
-
[38]
2025.GPT-5 System Card
OpenAI. 2025.GPT-5 System Card. Technical Report. OpenAI. https://cdn.openai. com/gpt-5-system-card.pdf Released August 13, 2025
2025
-
[39]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. InAd- vances in Neural Information Processing Systems, Vol. 35. 27730–27744
2022
-
[40]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290 [cs.LG] https://arxiv.org/ abs/2305.18290
work page internal anchor Pith review arXiv 2024
- [41]
- [42]
-
[43]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal. 2024. AI models collapse when trained on recursively generated data.Nature631 (2024), 755–759
2024
-
[45]
Joseph Spracklen, Raveen Wijewickrama, A H M Nazmus Sakib, Anindya Maiti, Bimal Viswanath, and Murtuza Jadliwala. 2025. We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs. InUSENIX Security Symposium
2025
-
[46]
Chenchen Tan, Youyang Qu, Xinghao Li, Hui Zhang, Shujie Cui, Cunjian Chen, and Longxiang Gao. 2025. Wisdom is Knowing What not to Say: Hallucination- Free LLMs Unlearning via Attention Shifting.arXiv preprint arXiv:2510.17210 (2025). doi:10.48550/arXiv.2510.17210
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.17210 2025
- [47]
-
[48]
Chaojun Wang and Rico Sennrich. 2020. On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Compu- tational Linguistics, Online, 3544–3552. doi:10.18653/v1/2020.acl-main.326
-
[49]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InInternational Conference on Learning Representations
2023
-
[50]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 35. 24824–24837
2022
-
[51]
Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, and Xinyu Dai. 2024. EFUF: Efficient Fine-Grained Unlearning Frame- work for Mitigating Hallucinations in Multimodal Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, M...
2024
-
[52]
doi:10.18653/v1/2024.emnlp-main.67
-
[53]
Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. 2025. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? arXiv:2504.13837 [cs.AI] https://arxiv.org/abs/2504.13837
work page internal anchor Pith review arXiv 2025
-
[54]
Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. 2024. Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning. InConference on Language Modeling (COLM)
2024
-
[55]
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. 2023. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.arXiv preprint arXiv:2309.01219(2023). https://arxiv.org/abs/2309.01219 A Open Science In a...
work page internal anchor Pith review arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.