CodeBlock partitions code responses into syntactically coherent blocks, scores them with generalized cross-entropy and data-flow signals, and applies sparse supervision to achieve higher pass@1 than full SFT using 1.9% of tokens on six benchmarks.
hub
Rho-1: Not all tokens are what you need
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
unclear 1representative citing papers
A pair selection strategy based on negative similarity dynamics strengthens contrastive supervision in gloss-free sign language translation by reducing noisy negatives.
VocabTailor introduces a decoupled dynamic vocabulary selection framework that reduces vocabulary-related memory in SLMs by up to 99% with minimal task performance loss.
A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.
Training-Trajectory-Aware Token Selection (T3S) reconstructs the token-level training objective to overcome a performance bottleneck in continual distillation of reasoning capabilities from large to small language models.
LightReasoner distills supervision signals from SLM-LLM behavioral divergence to improve LLM reasoning on math benchmarks with up to 28.1% accuracy gains and 90-99% reductions in resources.
GIFT weights tokens by entropy during fine-tuning of diffusion language models and reports better performance than standard SFT on reasoning benchmarks across multiple settings.
ReGATE introduces a teacher-student adaptive token elision method that reduces training tokens to 38% while matching or exceeding baseline accuracy on multimodal benchmarks.
Step-DPO performs preference optimization on individual reasoning steps rather than complete answers, producing nearly 3% accuracy gains on MATH for 70B+ parameter models with 10K preference pairs.
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
RadarPLM adapts PLMs for marine radar target detection with lightweight adaptation and selective fine-tuning based on online learning values, reporting at least 6.35% average detection gains in low SCR conditions.
SPREG detects logical failures in LLM long-chain reasoning through real-time entropy spikes and performs structured plan repairs using historical distributions, reporting a 20% absolute accuracy gain on AIME25.
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
citing papers explorer
-
CODEBLOCK: Learning to Supervise Code at the Right Granularity
CodeBlock partitions code responses into syntactically coherent blocks, scores them with generalized cross-entropy and data-flow signals, and applies sparse supervision to achieve higher pass@1 than full SFT using 1.9% of tokens on six benchmarks.
-
Selective Contrastive Learning For Gloss Free Sign Language Translation
A pair selection strategy based on negative similarity dynamics strengthens contrastive supervision in gloss-free sign language translation by reducing noisy negatives.
-
VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models
VocabTailor introduces a decoupled dynamic vocabulary selection framework that reduces vocabulary-related memory in SLMs by up to 99% with minimal task performance loss.
-
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.
-
Training-Trajectory-Aware Token Selection
Training-Trajectory-Aware Token Selection (T3S) reconstructs the token-level training objective to overcome a performance bottleneck in continual distillation of reasoning capabilities from large to small language models.
-
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
LightReasoner distills supervision signals from SLM-LLM behavioral divergence to improve LLM reasoning on math benchmarks with up to 28.1% accuracy gains and 90-99% reductions in resources.
-
GIFT: Guided Importance-Aware Fine-Tuning for Diffusion Language Models
GIFT weights tokens by entropy during fine-tuning of diffusion language models and reports better performance than standard SFT on reasoning benchmarks across multiple settings.
-
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
ReGATE introduces a teacher-student adaptive token elision method that reduces training tokens to 38% while matching or exceeding baseline accuracy on multimodal benchmarks.
-
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Step-DPO performs preference optimization on individual reasoning steps rather than complete answers, producing nearly 3% accuracy gains on MATH for 70B+ parameter models with 10K preference pairs.
-
DataComp-LM: In search of the next generation of training sets for language models
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
-
RadarPLM: Adapting Pre-trained Language Models for Marine Radar Target Detection by Selective Fine-tuning
RadarPLM adapts PLMs for marine radar target detection with lightweight adaptation and selective fine-tuning based on online learning values, reporting at least 6.35% average detection gains in low SCR conditions.
-
SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning
SPREG detects logical failures in LLM long-chain reasoning through real-time entropy spikes and performs structured plan repairs using historical distributions, reporting a 20% absolute accuracy gain on AIME25.
-
Learning to Reason at the Frontier of Learnability
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
- Understanding LoRA as Knowledge Memory: An Empirical Analysis