LLM agents overcommit on non-complete tasks at 41.7% unless given explicit support-state categories, which raise typed deferral accuracy to 91.7%.
CoRR , volume =
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A training-free method improves epistemic faithfulness of LLM textual explanations by guiding generation with attribution-based attention interventions.
Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
citing papers explorer
-
Don't Start What You Can't Finish: A Counterfactual Audit of Support-State Triage in LLM Agents
LLM agents overcommit on non-complete tasks at 41.7% unless given explicit support-state categories, which raise typed deferral accuracy to 91.7%.
-
Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance
A training-free method improves epistemic faithfulness of LLM textual explanations by guiding generation with attribution-based attention interventions.
-
Aligning AI With Shared Human Values
Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.
-
Medical Model Synthesis Architectures: A Case Study
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
-
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.