Introduces state commitment learning and Counterfactual Erasure RL (CERL) to train models to commit only persistent state, reducing answer dependence on hidden thoughts across math, logic, QA, and tool-use tasks without accuracy loss.
T oken S kip: Controllable Chain-of-Thought Compression in LLM s
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Post-hoc model-based compression of reasoning traces cuts training tokens to 12-30% and speeds training 2-7.6x while retaining up to 96% of raw-trace accuracy, though raw traces remain superior at every scale.
CAT uses intrinsic confidence signals in preference optimization to adapt reasoning length in LRMs, outperforming uniform compression baselines on accuracy across benchmarks.
citing papers explorer
-
State commitment learning: training language models to distinguish computation from memory
Introduces state commitment learning and Counterfactual Erasure RL (CERL) to train models to commit only persistent state, reducing answer dependence on hidden thoughts across math, logic, QA, and tool-use tasks without accuracy loss.
-
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation
Post-hoc model-based compression of reasoning traces cuts training tokens to 12-30% and speeds training 2-7.6x while retaining up to 96% of raw-trace accuracy, though raw traces remain superior at every scale.
-
CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models
CAT uses intrinsic confidence signals in preference optimization to adapt reasoning length in LRMs, outperforming uniform compression baselines on accuracy across benchmarks.