pith. machine review for the scientific record. sign in

Transactions of the Association for Computational Linguistics , volume =

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 3 2022 1

verdicts

UNVERDICTED 4

representative citing papers

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

citing papers explorer

Showing 4 of 4 citing papers.

  • Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unverdicted · none · ref 47

    Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

  • Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking cs.CL · 2026-05-11 · unverdicted · none · ref 15

    BICR uses blind-image contrastive ranking on frozen LVLM hidden states to train a lightweight probe that penalizes confidence on blacked-out inputs, yielding top calibration and discrimination across five models and multiple tasks at low parameter cost.

  • Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation cs.LG · 2026-04-21 · unverdicted · none · ref 39

    Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.

  • Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 201

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.