{"work":{"id":"c5a99022-15b6-4d77-9850-23036df7a073","openalex_id":null,"doi":null,"arxiv_id":"2603.05433","raw_key":null,"title":"CRISP: Compressed Reasoning via Iterative Self-Policy Distillation","authors":null,"authors_text":null,"year":2026,"venue":"cs.LG","abstract":"Reasoning models think out loud, but much of what they say is noise. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a ''be concise'' instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: CRISP automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57--59% token reduction on MATH-500 while improving accuracy by 9--16 points absolute. On AIME 2024, the 14B model gains 10 points with 41% compression. Ablations show that qualitative conciseness instructions outperform explicit token targets, and periodic teacher refreshes yield a broad stable regime. The method generalizes across model families -- DeepSeek-R1-Distill-Llama-8B improves accuracy by up to 5 points with 17--32% compression -- and transfers beyond math to multi-step agentic planning (DeepPlanning), reducing token usage by 42--51% while preserving planning quality. Code is available at https://github.com/HJSang/OPSD_Reasoning_Compression.","external_url":"https://arxiv.org/abs/2603.05433","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-14T21:33:00.115825+00:00","pith_arxiv_id":"2603.05433","created_at":"2026-05-09T05:45:22.421711+00:00","updated_at":"2026-05-14T21:33:00.115825+00:00","title_quality_ok":true,"display_title":"CRISP: Compressed Reasoning via Iterative Self-Policy Distillation","render_title":"CRISP: Compressed Reasoning via Iterative Self-Policy Distillation"},"hub":{"state":{"work_id":"c5a99022-15b6-4d77-9850-23036df7a073","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":14,"external_cited_by_count":null,"distinct_field_count":4,"first_pith_cited_at":"2026-04-03T15:50:07+00:00","last_pith_cited_at":"2026-05-13T15:05:30+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-14T21:56:16.968566+00:00","tier_text":"hub"},"tier":"hub","role_counts":[],"polarity_counts":[],"runs":{},"summary":{},"graph":{},"authors":[]}}