Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
Automated data curation for robust language model fine- tuning.arXiv preprint arXiv:2403.12776
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A joint task-model adaptation method learns optimal weights for data selection indicators via ICL proxies on small validation sets, matching or exceeding full-dataset fine-tuning performance with only 30% of samples on GSM8K.
citing papers explorer
-
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation
Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
-
Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies
A joint task-model adaptation method learns optimal weights for data selection indicators via ICL proxies on small validation sets, matching or exceeding full-dataset fine-tuning performance with only 30% of samples on GSM8K.