Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GAMMAF provides a benchmarking platform with data generation and defense evaluation pipelines for graph-based anomaly detection in LLM multi-agent systems, demonstrating improved integrity and lower operational costs when remediation is applied.
citing papers explorer
-
GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems
GAMMAF provides a benchmarking platform with data generation and defense evaluation pipelines for graph-based anomaly detection in LLM multi-agent systems, demonstrating improved integrity and lower operational costs when remediation is applied.