arxiv: 2401.01313 · v3 · submitted 2024-01-02 · 💻 cs.CL

Recognition: 1 theorem link

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

S.M Towhidul Islam Tonmoy , S M Mehedi Zaman , Vinija Jain , Anku Rani , Vipula Rawte , Aman Chadha , Amitava Das

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:10 UTC · model grok-4.3

classification 💻 cs.CL

keywords hallucination mitigationlarge language modelssurveytaxonomyretrieval augmented generationfeedback mechanismsknowledge retrievalLLM reliability

0 comments

The pith

A survey reviews more than thirty-two techniques for mitigating hallucinations in large language models and organizes them into a taxonomy based on dataset use, tasks, feedback, and retrievers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to compile and structure the existing methods that aim to stop large language models from producing content that looks factual yet lacks grounding in reliable information. Hallucinations block safe use of LLMs in high-stakes settings such as medical summaries or financial reports, because models trained on broad internet text can invent details or twist facts to fit prompts. The authors examine more than thirty-two concrete techniques, among them retrieval-augmented generation, knowledge retrieval, CoNLI, and chain-of-verification. They sort these approaches according to how they draw on datasets, the tasks they target, the feedback loops they employ, and the kinds of retrievers they rely on. The survey also flags persistent challenges and limits, thereby marking open problems for later work.

Core claim

The central claim is that a taxonomy built around dataset utilization, common tasks, feedback mechanisms, and retriever types brings order to more than thirty-two hallucination-mitigation techniques for large language models. The taxonomy distinguishes retrieval-based methods such as Retrieval Augmented Generation and Knowledge Retrieval from feedback-driven ones such as CoNLI and CoVe, while also cataloguing their respective limitations.

What carries the argument

A taxonomy that groups hallucination-mitigation techniques by dataset utilization, common tasks, feedback mechanisms, and retriever types.

If this is right

Practitioners can match mitigation methods to specific tasks and data resources using the taxonomy.
The documented limitations point to concrete gaps that new techniques must address.
Feedback-based approaches can be combined with retriever-based ones to strengthen factual consistency.
The categorization supplies a baseline for measuring progress in LLM reliability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy may need updating whenever new mitigation ideas appear after the survey cutoff.
If the categories prove stable, they could support standardized benchmarks that compare hallucination rates across methods.
Extending the same organizing principles to other reliability issues, such as bias amplification, might reveal shared mechanisms.
Real-world deployment trials could test whether the taxonomy actually speeds up selection of suitable techniques for medical or legal text generation.

Load-bearing premise

The chosen set of more than thirty-two techniques and the four-part taxonomy together give a complete and useful picture of the field with no major omissions.

What would settle it

Publication of a later survey that identifies a substantial number of additional techniques that fall outside the four categories or shows that the taxonomy does not help practitioners select methods for concrete applications.

read the original abstract

As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deploying these powerful LLMs into real-world production systems that impact people's lives. The journey toward widespread adoption of LLMs in practical settings heavily relies on addressing and mitigating hallucinations. Unlike traditional AI systems focused on limited tasks, LLMs have been exposed to vast amounts of online text data during training. While this allows them to display impressive language fluency, it also means they are capable of extrapolating information from the biases in training data, misinterpreting ambiguous prompts, or modifying the information to align superficially with the input. This becomes hugely alarming when we rely on language generation capabilities for sensitive applications, such as summarizing medical records, financial analysis reports, etc. This paper presents a comprehensive survey of over 32 techniques developed to mitigate hallucination in LLMs. Notable among these are Retrieval Augmented Generation (Lewis et al, 2021), Knowledge Retrieval (Varshney et al,2023), CoNLI (Lei et al, 2023), and CoVe (Dhuliawala et al, 2023). Furthermore, we introduce a detailed taxonomy categorizing these methods based on various parameters, such as dataset utilization, common tasks, feedback mechanisms, and retriever types. This classification helps distinguish the diverse approaches specifically designed to tackle hallucination issues in LLMs. Additionally, we analyze the challenges and limitations inherent in these techniques, providing a solid foundation for future research in addressing hallucinations and related phenomena within the realm of LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes over 30 hallucination mitigation techniques into a new taxonomy but rests its 'comprehensive' claim on an undocumented selection process.

read the letter

The main takeaway is that the paper collects a range of existing methods for reducing hallucinations in LLMs and sorts them by dataset use, tasks, feedback mechanisms, and retriever types. This taxonomy is the clearest new element; it gives a practical way to compare approaches like retrieval-augmented generation against chain-of-verification or consistency-checking variants without introducing any original algorithms or proofs. The authors do a straightforward job pulling in the expected high-profile examples and closing with a section on open challenges such as evaluation and scaling. That part feels honest and could save a reader time when scanning the literature. The soft spot is the absence of any stated literature review protocol. No search strings, databases, date ranges, or inclusion rules are described, so the claim of covering 32 techniques and delivering a complete taxonomy is hard to verify in a field that added many new papers in 2023-2024. Entire lines of work on post-training alignment or uncertainty estimation might be underrepresented by accident. This is useful for practitioners or early-stage researchers who need a quick map of options rather than a definitive reference. A reader already deep in the area would still want to cross-check coverage. I would send it to peer review with a request to document the selection method and test whether the taxonomy categories remain stable when more recent work is added.

Referee Report

1 major / 2 minor

Summary. The paper claims to provide a comprehensive survey of over 32 hallucination mitigation techniques in large language models (LLMs). It introduces a taxonomy categorizing these methods based on dataset utilization, tasks, feedback mechanisms, and retriever types, and discusses the challenges and limitations of these techniques.

Significance. Should the survey prove comprehensive and the taxonomy effective, the work would offer significant value by consolidating the literature on a critical issue for LLM reliability. This could guide future research in developing more robust mitigation strategies for applications in sensitive domains.

major comments (1)

[Introduction] The paper asserts a 'comprehensive survey' of over 32 techniques without documenting the literature search protocol, including databases, search terms, inclusion criteria, or date cutoffs. This is a load-bearing omission for the central claim, as it prevents verification of whether the taxonomy covers the field adequately or omits important approaches like uncertainty estimation methods.

minor comments (2)

[Abstract] Specific techniques are mentioned (e.g., RAG, CoNLI, CoVe) but the paper should include a table summarizing all surveyed techniques with their key characteristics to improve accessibility.
[Taxonomy] The categorization parameters are listed but would benefit from explicit examples or a mapping table showing which techniques fall under each category.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We agree that explicitly documenting the literature search protocol is necessary to substantiate the claim of a comprehensive survey and to allow verification of the taxonomy's coverage. We will revise the manuscript to address this.

read point-by-point responses

Referee: The paper asserts a 'comprehensive survey' of over 32 techniques without documenting the literature search protocol, including databases, search terms, inclusion criteria, or date cutoffs. This is a load-bearing omission for the central claim, as it prevents verification of whether the taxonomy covers the field adequately or omits important approaches like uncertainty estimation methods.

Authors: We acknowledge this is a valid point and that the current manuscript lacks an explicit search protocol description. In the revised version, we will add a dedicated subsection in the Introduction (or a new 'Survey Methodology' section) detailing: databases searched (arXiv, Google Scholar, ACL Anthology, proceedings of NeurIPS/ICML/ICLR/ACL/EMNLP), search terms (e.g., 'LLM hallucination mitigation', 'reducing hallucinations in large language models', 'factuality in LLMs', 'hallucination detection and mitigation'), inclusion criteria (peer-reviewed or preprint papers from 2020 onward that propose or evaluate techniques specifically for mitigating hallucinations in generative LLMs, with empirical results), and cutoff date (December 2023). We will also review and incorporate uncertainty estimation approaches (e.g., methods using token-level confidence, ensemble variance, or self-consistency checks) into the taxonomy, likely under feedback mechanisms or as a related category, with discussion of how they complement or overlap with the 32+ techniques already covered. This revision will make the survey's scope verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: survey compiles external literature

full rationale

This is a literature survey paper that reviews and organizes over 32 existing hallucination mitigation techniques from external sources (e.g., RAG, CoVe, CoNLI). It introduces a taxonomy based on dataset utilization, tasks, feedback mechanisms, and retriever types, but this is presented as an organizational framework drawn from the reviewed works rather than any derivation, prediction, or fitted parameter. No equations, self-definitional claims, load-bearing self-citations, or reductions of results to the paper's own inputs exist. The central claims rest on compilation of prior literature, making the derivation chain self-contained with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the central contribution is the compilation and taxonomy of prior work. No free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5639 in / 1022 out tokens · 29658 ms · 2026-05-15T19:10:59.545675+00:00 · methodology

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination Detection
cs.CL 2026-05 accept novelty 7.0

CiteTracer detects citation hallucinations at 97.1% accuracy on synthetic and real-world benchmarks by combining structured extraction, multi-source retrieval, deterministic matching, and class-specialist agents.
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
cs.CV 2026-05 unverdicted novelty 7.0

VideoNet is a new large-scale benchmark and training dataset for domain-specific action recognition that exposes limitations in VLMs and enables smaller fine-tuned models to surpass larger open-weight ones.
HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs
cs.AR 2026-04 unverdicted novelty 7.0

HAVEN combines LLM agents for planning and gap analysis with protocol-specific templates and a custom DSL to generate correct UVM testbenches, achieving 100% compilation success, 90.6% code coverage, and 87.9% functio...
Uncertainty Propagation in LLM-Based Systems
cs.SE 2026-04 unverdicted novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
Context-Fidelity Boosting: Enhancing Faithful Generation through Watermark-Inspired Decoding
cs.CL 2026-04 unverdicted novelty 7.0

Context-Fidelity Boosting reduces faithfulness hallucinations by applying context-based logit boosts to source-supported tokens during LLM decoding.
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
cs.AI 2026-03 unverdicted novelty 7.0

GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference t...
Common-agency Games for Multi-Objective Test-Time Alignment
cs.GT 2026-05 unverdicted novelty 6.0

CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
cs.CL 2026-05 unverdicted novelty 6.0

DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
cs.CL 2026-04 unverdicted novelty 6.0

Each tested LLM shows its own characteristic unreliability when engaging in repair during extended math-question dialogues.
PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
cs.CL 2026-04 unverdicted novelty 6.0

PRISM benchmark disentangles LLM hallucinations into knowledge missing, knowledge errors, reasoning errors, and instruction-following errors across three generation stages, revealing trade-offs when testing 24 models.
From Clues to Generation: Language-Guided Conditional Diffusion for Cross-Domain Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

LGCD creates pseudo-overlapping user data via LLM reasoning and uses conditional diffusion to generate target-domain user representations for inter-domain sequential recommendation without real overlapping users.
Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation
cs.CL 2026-04 unverdicted novelty 6.0

An importance-aware recall metric for LLM factuality evaluation reveals models are better at avoiding false claims than covering all relevant facts.
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
cs.CL 2026-02 unverdicted novelty 6.0

CiteAudit supplies a human-validated benchmark and multi-agent verification system that outperforms existing LLMs and commercial tools at detecting hallucinated scientific references.
Corrective Retrieval Augmented Generation
cs.CL 2024-01 unverdicted novelty 6.0

CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generati...
The Dynamic Gist-Based Memory Model (DGMM): A Memory-Centric Architecture for Artificial Intelligence
cs.AI 2026-05 unverdicted novelty 3.0

DGMM is proposed as an explicit graph-structured memory architecture for AI that enables persistent episodic memory, cue-based recall, and context-dependent interpretation without retraining.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
cs.CL 2024-12 accept novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
A Survey on the Memory Mechanism of Large Language Model based Agents
cs.AI 2024-04 accept novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
cs.AI 2024-02 unverdicted novelty 3.0

A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a t...

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 19 Pith papers · 5 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[2]

Publications Manual , year = "1983", publisher =

work page 1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[5]

Dan Gusfield , title =. 1997

work page 1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[8]

2023 , eprint=

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation , author=. 2023 , eprint=

work page 2023
[9]

Why is this misleading?

"Why is this misleading?": Detecting News Headline Hallucinations with Explanations , author=. 2023 , eprint=

work page 2023
[10]

2023 , eprint=

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models , author=. 2023 , eprint=

work page 2023
[11]

2023 , eprint=

Augmenting LLMs with Knowledge: A survey on hallucination prevention , author=. 2023 , eprint=

work page 2023
[12]

2023 , eprint=

Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations , author=. 2023 , eprint=

work page 2023
[13]

2023 , eprint=

Chain-of-Verification Reduces Hallucination in Large Language Models , author=. 2023 , eprint=

work page 2023
[14]

2023 , eprint=

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , author=. 2023 , eprint=

work page 2023
[15]

2023 , eprint=

Detecting and Mitigating Hallucinations in Multilingual Summarisation , author=. 2023 , eprint=

work page 2023
[16]

2023 , eprint=

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models , author=. 2023 , eprint=

work page 2023
[17]

2023 , eprint=

Hallucination Reduction in Long Input Text Summarization , author=. 2023 , eprint=

work page 2023
[18]

2023 , eprint=

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models , author=. 2023 , eprint=

work page 2023
[19]

2023 , eprint=

Lyra: Orchestrating Dual Correction in Automated Theorem Proving , author=. 2023 , eprint=

work page 2023
[20]

2023 , eprint=

Making Large Language Models Perform Better in Knowledge Graph Completion , author=. 2023 , eprint=

work page 2023
[21]

2023 , eprint=

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis , author=. 2023 , eprint=

work page 2023
[22]

2023 , eprint=

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation , author=. 2023 , eprint=

work page 2023
[23]

2023 , eprint=

Teaching Language Models to Hallucinate Less with Synthetic Tasks , author=. 2023 , eprint=

work page 2023
[24]

2023 , eprint=

The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models , author=. 2023 , eprint=

work page 2023
[25]

2023 , eprint=

Towards Mitigating Hallucination in Large Language Models via Self-Reflection , author=. 2023 , eprint=

work page 2023
[26]

2023 , eprint=

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding , author=. 2023 , eprint=

work page 2023
[27]

2023 , eprint=

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation , author=. 2023 , eprint=

work page 2023
[28]

Potsawee Manakul and Adian Liusie and Mark J. F. Gales , month =. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , url =

work page
[29]

arXiv preprint arXiv:2305.13534 , year=

How language model hallucinations can snowball , author=. arXiv preprint arXiv:2305.13534 , year=

work page arXiv
[30]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models , author=. arXiv preprint arXiv:2309.01219 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

ACM Computing Surveys , volume=

Survey of hallucination in natural language generation , author=. ACM Computing Surveys , volume=. 2023 , publisher=

work page 2023
[32]

arXiv preprint arXiv:2309.05922 , year=

A survey of hallucination in large foundation models , author=. arXiv preprint arXiv:2309.05922 , year=

work page arXiv
[33]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. arXiv preprint arXiv:2005.11401v4 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2005
[34]

2023 , eprint=

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation , author=. 2023 , eprint=

work page 2023
[35]

2023 , eprint=

Hallucination Augmented Recitations for Language Models , author=. 2023 , eprint=

work page 2023
[36]

2023 , eprint=

A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models , author=. 2023 , eprint=

work page 2023
[37]

Submitted to The Twelfth International Conference on Learning Representations , year=

Fine-Tuning Language Models for Factuality , author=. Submitted to The Twelfth International Conference on Learning Representations , year=

work page
[38]

2023 , eprint=

Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey , author=. 2023 , eprint=

work page 2023
[39]

2023 , eprint=

On What Basis? Predicting Text Preference Via Structured Comparative Reasoning , author=. 2023 , eprint=

work page 2023
[40]

2023 , eprint=

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification , author=. 2023 , eprint=

work page 2023
[41]

arXiv preprint arXiv:2311.09467 , year=

Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation , author=. arXiv preprint arXiv:2311.09467 , year=

work page arXiv
[42]

arXiv preprint arXiv:2311.09677 , year=

R-Tuning: Teaching Large Language Models to Refuse Unknown Questions , author=. arXiv preprint arXiv:2311.09677 , year=

work page arXiv
[43]

arXiv preprint arXiv:2311.10081 , year=

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback , author=. arXiv preprint arXiv:2311.10081 , year=

work page arXiv
[44]

2311.09800 , archivePrefix=

Evgeniia Razumovskaia and Ivan Vulić and Pavle Marković and Tomasz Cichy and Qian Zheng and Tsung-Hsien Wen and Paweł Budzianowski , year=. 2311.09800 , archivePrefix=

work page arXiv
[45]

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , author=. arXiv preprint arXiv:2306.03341 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[46]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Rarr: Researching and revising what language models say, using language models , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[47]

arXiv preprint arXiv:2210.09150 , year=

Prompting gpt-3 to be reliable , author=. arXiv preprint arXiv:2210.09150 , year=

work page arXiv
[48]

2023 , eprint=

Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models , author=. 2023 , eprint=

work page 2023
[49]

2023 , eprint=

Fine-tuning Language Models for Factuality , author=. 2023 , eprint=

work page 2023
[50]

2023 , eprint=

The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations , author=. 2023 , eprint=

work page 2023
[51]

ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope , journal =

Partha Pratim Ray , keywords =. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.iotcps.2023.04.003 , url =

work page doi:10.1016/j.iotcps.2023.04.003 2023
[52]

2023 , eprint=

FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge , author=. 2023 , eprint=

work page 2023
[53]

Nature , volume=

GPT-4 is here: what scientists think , author=. Nature , volume=. 2023 , publisher=

work page 2023
[54]

arXiv preprint arXiv:2305.15852 , year=

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation , author=. arXiv preprint arXiv:2305.15852 , year=

work page arXiv
[55]

2023 , eprint=

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT , author=. 2023 , eprint=

work page 2023
[56]

2023 , eprint=

Trapping LLM Hallucinations Using Tagged Context Prompts , author=. 2023 , eprint=

work page 2023
[57]

2023 , eprint=

Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models , author=. 2023 , eprint=

work page 2023
[58]

International Conference on Learning Representations , year=

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , author=. International Conference on Learning Representations , year=

work page
[59]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. arXiv preprint arXiv:1910.01108 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910
[60]

2023 , eprint=

Self-Refine: Iterative Refinement with Self-Feedback , author=. 2023 , eprint=

work page 2023
[61]

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

Lango, Mateusz and Dusek, Ondrej. Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.172

work page doi:10.18653/v1/2023.emnlp-main.172 2023
[62]

2023 , eprint=

Instruction Tuning for Large Language Models: A Survey , author=. 2023 , eprint=

work page 2023
[63]

2023 , eprint=

Self-Instruct: Aligning Language Models with Self-Generated Instructions , author=. 2023 , eprint=

work page 2023
[64]

2022 , eprint=

Scaling Instruction-Finetuned Language Models , author=. 2022 , eprint=

work page 2022
[65]

2023 , eprint=

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization , author=. 2023 , eprint=

work page 2023
[66]

2023 , eprint=

WizardLM: Empowering Large Language Models to Follow Complex Instructions , author=. 2023 , eprint=

work page 2023
[67]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

work page 2023
[68]

2023 , eprint=

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision , author=. 2023 , eprint=

work page 2023
[69]

Large-Scale Adversarial Training for Vision-and-Language Representation Learning , url =

Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing , booktitle =. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , url =

work page
[70]

arXiv preprint arXiv:2308.10168 , year=

Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? AKA Will LLMs Replace Knowledge Graphs? , author=. arXiv preprint arXiv:2308.10168 , year=

work page arXiv
[71]

arXiv preprint arXiv:2108.13759 , year=

Enjoy the salience: Towards better transformer-based faithful explanations with word salience , author=. arXiv preprint arXiv:2108.13759 , year=

work page arXiv
[72]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Bloom: A 176b-parameter open-access multilingual language model , author=. arXiv preprint arXiv:2211.05100 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[73]

Distilled AI , year =

Hallucination Mitigation , author =. Distilled AI , year =

work page
[74]

Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[75]

Neural Approaches for Data Driven Dependency Parsing in S anskrit

Krishna, Amrith and Gupta, Ashim and Garasangi, Deepak and Sandhan, Jeevnesh and Satuluri, Pavankumar and Goyal, Pawan. Neural Approaches for Data Driven Dependency Parsing in S anskrit. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[76]

Evaluating Neural Word Embeddings for S anskrit

Sandhan, Jivnesh and Paranjay, Om Adideva and Digumarthi, Komal and Behra, Laxmidhar and Goyal, Pawan. Evaluating Neural Word Embeddings for S anskrit. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[77]

Validation and Normalization of DCS corpus and Development of the S anskrit Heritage Engine ' s Segmenter

Sriram, Krishnan and Kulkarni, Amba and Huet, G \'e rard. Validation and Normalization of DCS corpus and Development of the S anskrit Heritage Engine ' s Segmenter. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[78]

Pre-annotation Based Approach for Development of a S anskrit Named Entity Recognition Dataset

Sujoy, Sarkar and Krishna, Amrith and Goyal, Pawan. Pre-annotation Based Approach for Development of a S anskrit Named Entity Recognition Dataset. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[79]

Disambiguation of Instrumental, Dative and Ablative Case suffixes in S anskrit

Maity, Malay and Panchal, Sanjeev and Kulkarni, Amba. Disambiguation of Instrumental, Dative and Ablative Case suffixes in S anskrit. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023
[80]

Creation of a Digital Rig V edic Index (Anukramani) for Computational Linguistic Tasks

Mahesh, A V S D S and Bhattacharya, Arnab. Creation of a Digital Rig V edic Index (Anukramani) for Computational Linguistic Tasks. Proceedings of the Computational S anskrit & Digital Humanities: Selected papers presented at the 18th World S anskrit Conference. 2023

work page 2023

Showing first 80 references.