{"paper":{"title":"DIVER:Diving Deeper into Distilled Data via Expressive Semantic Recovery","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A dual-stage framework uses a pre-trained diffusion model to recover expressive semantics from distilled datasets and improve performance across different neural architectures.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Guoming Lu, Jiawei Du, Jielei Wang, Qianxin Xia, Wenbo Jiang, Zhiyong Shu","submitted_at":"2026-05-12T18:55:53Z","abstract_excerpt":"Dataset distillation aims to synthesize a compact proxy dataset that is unreadable or non-raw from the original dataset for privacy protection and highly efficient learning. However, previous approaches typically adopt a single-stage distillation paradigm, which suffers from learning specific patterns that overfit on a prior architecture, consequently suppressing the expression of semantics and leading to performance degradation across heterogeneous architectures. To address this issue, we propose a novel dual-stage distillation framework called ${\\textbf{DIVER}}$, which leverages the pre-trai"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"DIVER leverages the pre-trained diffusion model to dive deeper into distilled data via expressive semantic recovery, an entire process of semantic inheritance, guidance, and fusion... significantly improving cross-architecture generalization, requiring processing time comparable to raw DiT on ImageNet (256×256) with only 4 GB of GPU memory usage.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the pre-trained diffusion model can reliably filter architecture-specific noise in the latent space while preserving intrinsic semantics, and that applying semantic guidance only in the concrete phase of the reverse process avoids ambiguity and artifacts without losing essential information.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"DIVER is a dual-stage distillation method using diffusion models to enhance semantic preservation and cross-architecture generalization in dataset distillation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A dual-stage framework uses a pre-trained diffusion model to recover expressive semantics from distilled datasets and improve performance across different neural architectures.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"b3339534b7e133fc3e0170b8783b56b00dc0f9079446d8db0d13ac2ef97f5d9d"},"source":{"id":"2605.12649","kind":"arxiv","version":1},"verdict":{"id":"e06c086f-7eb1-4924-9eaa-e10050547e63","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T21:11:44.677198Z","strongest_claim":"DIVER leverages the pre-trained diffusion model to dive deeper into distilled data via expressive semantic recovery, an entire process of semantic inheritance, guidance, and fusion... significantly improving cross-architecture generalization, requiring processing time comparable to raw DiT on ImageNet (256×256) with only 4 GB of GPU memory usage.","one_line_summary":"DIVER is a dual-stage distillation method using diffusion models to enhance semantic preservation and cross-architecture generalization in dataset distillation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the pre-trained diffusion model can reliably filter architecture-specific noise in the latent space while preserving intrinsic semantics, and that applying semantic guidance only in the concrete phase of the reverse process avoids ambiguity and artifacts without losing essential information.","pith_extraction_headline":"A dual-stage framework uses a pre-trained diffusion model to recover expressive semantics from distilled datasets and improve performance across different neural architectures."},"references":{"count":62,"sample":[{"doi":"","year":null,"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","ref_index":1,"cited_arxiv_id":"1503.02531","is_internal_anchor":true},{"doi":"","year":null,"title":"Dataset distillation","work_id":"e5036812-7ef1-4616-8677-c754d141d74f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=","work_id":"936ed1c3-8aa1-45f3-8498-656dc58b15e3","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2006,"title":"Dataset condensation with gradient matching","work_id":"d8fcd60b-5c36-46b6-9812-2613367634e7","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=","work_id":"8f83fccf-91dc-4aee-b3b1-d11acd3418b9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":62,"snapshot_sha256":"471f6b40f87e0dbf210475ede72af90fee97498ea7b27171673388b32ad19c72","internal_anchors":5},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}