Introduces state commitment learning and Counterfactual Erasure RL (CERL) to train models to commit only persistent state, reducing answer dependence on hidden thoughts across math, logic, QA, and tool-use tasks without accuracy loss.
T oken S kip: Controllable chain-of-thought compression in LLM s
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Post-hoc model-based compression of reasoning traces cuts training tokens to 12-30% and speeds training 2-7.6x while retaining up to 96% of raw-trace accuracy, though raw traces remain superior at every scale.
citing papers explorer
-
State commitment learning: training language models to distinguish computation from memory
Introduces state commitment learning and Counterfactual Erasure RL (CERL) to train models to commit only persistent state, reducing answer dependence on hidden thoughts across math, logic, QA, and tool-use tasks without accuracy loss.
-
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation
Post-hoc model-based compression of reasoning traces cuts training tokens to 12-30% and speeds training 2-7.6x while retaining up to 96% of raw-trace accuracy, though raw traces remain superior at every scale.