EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Presents a distributional model of linguistic confidence, Faithfulness Divergence metric, and RALC pipeline that boosts faithfulness and calibration on QA benchmarks across LLM families.
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
citing papers explorer
-
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
-
Retrieval-Augmented Linguistic Calibration
Presents a distributional model of linguistic confidence, Faithfulness Divergence metric, and RALC pipeline that boosts faithfulness and calibration on QA benchmarks across LLM families.
-
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.