RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
Large language models lack essential metacognition for reliable medical reasoning.Nature Communications, 16, 01 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
IndicBERT-HPA with language-aware adapters and verification-guided deferral outperforms baselines on multilingual orthopedic note classification, reaching 0.8792 Macro-F1 overall and 84.4% selective accuracy at 72.3% coverage.
citing papers explorer
-
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs
RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
-
Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral
IndicBERT-HPA with language-aware adapters and verification-guided deferral outperforms baselines on multilingual orthopedic note classification, reaching 0.8792 Macro-F1 overall and 84.4% selective accuracy at 72.3% coverage.