The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

representative citing papers

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

SCA framework applies Information Bottleneck to assign step-level confidence in black-box LLM reasoning traces, flagging errors and boosting self-correction success by up to 13.5% on math and QA tasks.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

Social identity markers in medical questions degrade LLM accuracy and uncertainty calibration, producing a calibration crisis that is non-additive for intersectional cases.

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

cs.CL · 2023-03-15 · unverdicted · novelty 6.0

SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

A geometric relation of the error introduced by sampling a language model's output distribution to its internal state

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.

citing papers explorer

Showing 7 of 7 citing papers.

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution cs.CL · 2026-05-19 · unverdicted · none · ref 17
SCA framework applies Information Bottleneck to assign step-level confidence in black-box LLM reasoning traces, flagging errors and boosting self-correction success by up to 13.5% on math and QA tasks.
Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 29
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 32
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA cs.CL · 2026-04-19 · unverdicted · none · ref 44
Social identity markers in medical questions degrade LLM accuracy and uncertainty calibration, producing a calibration crisis that is non-additive for intersectional cases.
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models cs.CL · 2023-03-15 · unverdicted · none · ref 65
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 12
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state cs.LG · 2026-05-06 · unverdicted · none · ref 21 · 2 links
A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.

The Eleventh International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer