hub

Fact-checking the output of large language models via token- level uncertainty quantification

Fadeeva, E · 2024 · arXiv 2403.04696

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Sanity Checks for Long-Form Hallucination Detection

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Hallucination detectors on LLM reasoning traces often rely on final-answer artifacts rather than reasoning validity; once controlled, lightweight lexical trajectory features suffice for robust detection.

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

CASPO trains LLMs via iterative direct preference optimization so that token-level confidence tracks step-wise correctness, then applies Confidence-aware Thought pruning at inference to improve both reliability and speed on reasoning benchmarks.

Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.

Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders

cs.IR · 2026-04-09 · unverdicted · novelty 6.0

KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

cs.CL · 2025-02-20 · unverdicted · novelty 6.0

Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models

cs.CL · 2024-08-20 · unverdicted · novelty 6.0

A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.

The CRISTAL Method: Neurosymbolic analysis from AI-synthesized world models

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

CRISTAL is a neurosymbolic framework that synthesizes interpretable probabilistic world models from language priors for full Bayesian analysis and budget-aware data acquisition, claiming Bayes-optimal accuracy on synthetic equity classification with 5 examples.

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

cs.CL · 2026-04-16 · unverdicted · novelty 5.0

IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.

Can LLMs Make (Personalized) Access Control Decisions?

cs.CR · 2025-11-25 · unverdicted · novelty 5.0

LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.

Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models

cs.CL · 2025-03-24 · unverdicted · novelty 4.0

LLMs show improved accuracy on gastroenterology questions but remain overconfident in self-reported certainty across commercial, open-source, and quantized variants.

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

cs.LG · 2026-05-05

citing papers explorer

Showing 2 of 2 citing papers after filters.

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable cs.AI · 2026-05-08 · unverdicted · none · ref 8
CASPO trains LLMs via iterative direct preference optimization so that token-level confidence tracks step-wise correctness, then applies Confidence-aware Thought pruning at inference to improve both reliability and speed on reasoning benchmarks.
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders cs.IR · 2026-04-09 · unverdicted · none · ref 11
KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.

Fact-checking the output of large language models via token- level uncertainty quantification

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer