pith. sign in

hub Canonical reference

Self- evolving curriculum for llm reasoning.arXiv preprint arXiv:2505.14970

Canonical reference. 88% of citing Pith papers cite this work as background.

17 Pith papers citing it
Background 88% of classified citations

hub tools

citation-role summary

background 7 baseline 1

citation-polarity summary

years

2026 14 2025 3

representative citing papers

Learnability-Informed Fine-Tuning of Diffusion Language Models

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

LIFT is a learnability-informed SFT algorithm for diffusion LMs that aligns token difficulty with diffusion time steps, yielding up to 3x gains on AIME'24 and AIME'25 over standard SFT baselines.

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

cs.LG · 2025-10-09 · unverdicted · novelty 6.0

The paper defines a Gradient Gap for RLVR policy gradients and proves a sharp step-size threshold below which training converges and above which it collapses, with predictions for length and success-rate scaling validated in simulations and on Qwen2.5-Math-7B.

citing papers explorer

Showing 17 of 17 citing papers.