Hallucination detectors on LLM reasoning traces often rely on final-answer artifacts rather than reasoning validity; once controlled, lightweight lexical trajectory features suffice for robust detection.
hub
Fact-checking the output of large language models via token- level uncertainty quantification
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 11roles
background 2polarities
background 2representative citing papers
CASPO trains LLMs via iterative direct preference optimization so that token-level confidence tracks step-wise correctness, then applies Confidence-aware Thought pruning at inference to improve both reliability and speed on reasoning benchmarks.
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.
Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.
A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.
CRISTAL is a neurosymbolic framework that synthesizes interpretable probabilistic world models from language priors for full Bayesian analysis and budget-aware data acquisition, claiming Bayes-optimal accuracy on synthetic equity classification with 5 examples.
ACSE estimates LLM uncertainty via adaptive semantic entropy clustering with conformal prediction guarantees, reporting higher AUROC than token entropy baselines on datasets like TriviaQA.
IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.
LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.
LLMs show improved accuracy on gastroenterology questions but remain overconfident in self-reported certainty across commercial, open-source, and quantized variants.
citing papers explorer
-
Sanity Checks for Long-Form Hallucination Detection
Hallucination detectors on LLM reasoning traces often rely on final-answer artifacts rather than reasoning validity; once controlled, lightweight lexical trajectory features suffice for robust detection.
-
Confidence-Aware Alignment Makes Reasoning LLMs More Reliable
CASPO trains LLMs via iterative direct preference optimization so that token-level confidence tracks step-wise correctness, then applies Confidence-aware Thought pruning at inference to improve both reliability and speed on reasoning benchmarks.
-
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
-
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders
KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.
-
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models
Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.
-
Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.
-
The CRISTAL Method: Neurosymbolic analysis from AI-synthesized world models
CRISTAL is a neurosymbolic framework that synthesizes interpretable probabilistic world models from language priors for full Bayesian analysis and budget-aware data acquisition, claiming Bayes-optimal accuracy on synthetic equity classification with 5 examples.
-
LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy
ACSE estimates LLM uncertainty via adaptive semantic entropy clustering with conformal prediction guarantees, reporting higher AUROC than token entropy baselines on datasets like TriviaQA.
-
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation
IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.
-
Can LLMs Make (Personalized) Access Control Decisions?
LLMs reflect users' privacy preferences in access control decisions with up to 86% agreement and can promote safer behavior, but personalization trades off higher individual match for potentially less secure results when users over-permission.
-
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
LLMs show improved accuracy on gastroenterology questions but remain overconfident in self-reported certainty across commercial, open-source, and quantized variants.