REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
hub
arXiv:2402.03744 [cs]
15 Pith papers cite this work, alongside 581 external citations. Polarity classification is still indexing.
hub tools
years
2026 15representative citing papers
LLMs prompted with increasing levels of text on TNO spectral reconstruction from photometry reveal an entropy floor where implementation variance persists, showing text alone cannot capture all tacit expert knowledge needed for exact replication.
SemGrad is a gradient-based uncertainty quantification technique for free-form LLM generation that operates in semantic space using a Semantic Preservation Score to select stable embeddings.
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
SENECA uses a novel self-consistent missing mass calculation to improve discrete entropy estimates in small-sample regimes and outperforms alternatives in numerical tests.
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
OSCAR reduces hallucinations in diffusion language models by localizing commitment uncertainty with cross-chain entropy on parallel trajectories and applying evidence-guided remasking.
Semantic distance on program execution behaviors improves uncertainty estimation for LLM code generation and outperforms prior sample-based methods across benchmarks and models.
OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.
Decision theory shows that LLM cascades are structurally limited by always incurring the cheap model's cost before deciding to escalate, with the best performance given by the envelope of pairwise cascades rather than fixed chains or many stages.
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
Symmetric spectral diagnostics on attention are structurally blind to flow direction, with asymmetry G as the sole control parameter, yielding a two-axis test that distinguishes bottleneck versus diffuse hallucination modes with opposite polarity.
LLM token rank-frequency distributions converge to a shared Mandelbrot distribution across models and domains, enabling a microsecond-scale statistical primitive for provenance verification and black-box anomaly triage.
LLM hallucinations arise from task-dependent basins in latent space, with separability varying by task and geometry-aware steering reducing their probability.
citing papers explorer
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
-
Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction
LLMs prompted with increasing levels of text on TNO spectral reconstruction from photometry reveal an entropy floor where implementation variance persists, showing text alone cannot capture all tacit expert knowledge needed for exact replication.
-
Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models
SemGrad is a gradient-based uncertainty quantification technique for free-form LLM generation that operates in semantic space using a Semantic Preservation Score to select stable embeddings.
-
Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference
Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.
-
SENECA: Small-Sample Discrete Entropy Estimation via Self-Consistent Missing Mass
SENECA uses a novel self-consistent missing mass calculation to improve discrete entropy estimates in small-sample regimes and outperforms alternatives in numerical tests.
-
Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
-
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
-
OSCAR: Orchestrated Self-verification and Cross-path Refinement
OSCAR reduces hallucinations in diffusion language models by localizing commitment uncertainty with cross-chain entropy on parallel trajectories and applying evidence-guided remasking.
-
Using Semantic Distance to Estimate Uncertainty in LLM-Based Code Generation
Semantic distance on program execution behaviors improves uncertainty estimation for LLM code generation and outperforms prior sample-based methods across benchmarks and models.
-
OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control
OracleTSC introduces a reward hurdle and uncertainty regularization to stabilize LLM-based reinforcement learning for traffic signal control, delivering 75% lower travel time and 67% lower queue length on benchmarks plus cross-intersection generalization.
-
Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
Decision theory shows that LLM cascades are structurally limited by always incurring the cheap model's cost before deciding to escalate, with the best performance given by the envelope of pairwise cascades rather than fixed chains or many stages.
-
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
-
Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
Symmetric spectral diagnostics on attention are structurally blind to flow direction, with asymmetry G as the sole control parameter, yielding a two-axis test that distinguishes bottleneck versus diffuse hallucination modes with opposite polarity.
-
The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive
LLM token rank-frequency distributions converge to a shared Mandelbrot distribution across models and domains, enabling a microsecond-scale statistical primitive for provenance verification and black-box anomaly triage.
-
Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations
LLM hallucinations arise from task-dependent basins in latent space, with separability varying by task and geometry-aware steering reducing their probability.