Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Aiwei Liu; Hao Zhang; Irwin King; Jiahong Liu; Minda Hu; Shaohang Wei; Wenhao Yu; Yifan Li

arxiv: 2602.01745 · v2 · pith:732K43PWnew · submitted 2026-02-02 · 💻 cs.LG · cs.AI

Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Wenhao Yu , Shaohang Wei , Jiahong Liu , Yifan Li , Minda Hu , Aiwei Liu , Hao Zhang , Irwin King This is my paper

classification 💻 cs.LG cs.AI

keywords fine-tuningindicatorrankalignmentcalibrationentropyground-truthignoring

0 comments

read the original abstract

Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-dimensional: the ground-truth probability reflects downstream alignment, while token entropy reflects intrinsic uncertainty induced by the pre-training prior. Ignoring entropy can misidentify noisy or easily replaceable tokens as learning-critical, while ignoring probability fails to reflect target-specific alignment. RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution. The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens without over-penalizing intrinsically uncertain positions. Experiments on multiple backbones show consistent improvements on mathematical reasoning benchmarks, transfer gains on out-of-distribution reasoning, and pre code generation performance over probability-only or entropy-only reweighting baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions
cs.CL 2026-06 unverdicted novelty 5.0

A3M integrates adaptive DRL, adversarial opponent modeling, and multi-objective rewards to cut regret 30-40% versus baselines while remaining robust to strategy shifts in repeated auctions.
PriFT: Prior-Support Guided Supervised Fine-Tuning
cs.CL 2026-06 unverdicted novelty 5.0

PriFT uses token reweighting signals from a frozen pretrained model to stabilize SFT and achieve better results than standard SFT baselines on reasoning tasks.
EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
cs.CL 2026-05 unverdicted novelty 5.0

EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.
EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control
cs.CL 2026-06 unverdicted novelty 4.0

EVLA combines a Unified Co-State Encoder and Electro-aware Structured Reasoning Chain with physics-guided training to produce energy-optimal driving decisions, reporting +5.6% accuracy gains over fine-tuned VLM baseli...
FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization
cs.CL 2026-06 unverdicted novelty 4.0

FinInvest-GTCN combines graph, temporal, and causal networks with meta-causal adaptation to improve risk-adjusted predictions for VC investments, achieving RA-MSE of 2.51 and 18.7% higher simulated returns on propriet...