super hub Canonical reference

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals · 2015 · stat.ML · arXiv 1503.02531

Canonical reference. 80% of citing Pith papers cite this work as background.

731 Pith papers citing it

Background 80% of classified citations

open full Pith review browse 731 citing papers more from Geoffrey Hinton arXiv PDF

abstract

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 71 method 14 other 2 dataset 1

citation-polarity summary

background 70 use method 13 unclear 3 support 1 use dataset 1

claims ledger

abstract A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using

authors

and Jeff Dean Geoffrey Hinton Oriol Vinyals

co-cited works

representative citing papers

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback

cs.LG · 2026-06-29 · unverdicted · novelty 8.0

Noisy expert imitation learning requires exponential samples for offline methods but polynomial for a variant of on-policy distillation under a noise condition.

Proofs of Ownership for Machine Learning Models

cs.LG · 2026-06-29 · unverdicted · novelty 8.0

A formal game-based study establishes that black-box proofs of ownership for ML classifiers are possible precisely when the concept class is not self-correctable.

PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs

cs.AI · 2026-05-11 · conditional · novelty 8.0

PrimeKG-CL supplies the first continual graph learning benchmark using authentic temporal snapshots from nine biomedical databases, showing strong interactions between embedding decoders and learning strategies plus limits of standard metrics on retention versus forgetting.

Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

Inference-time refinement of pre-trained tabular diffusion models via Bidirectional Chamfer Refinement achieves median 8.6% better downstream performance than real data across 15 benchmarks while preserving fidelity and privacy.

Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters

quant-ph · 2026-05-07 · unverdicted · novelty 8.0

Cayley unitary adapters executed on real quantum hardware improve LLM perplexity by 1.4% on Llama 3.1 8B with 6000 parameters and recover 83% of compression-induced degradation on SmolLM2.

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

cs.CL · 2023-05-12 · conditional · novelty 8.0

Tiny language models under 10M parameters trained on a synthetic children's story dataset generate fluent, consistent, multi-paragraph English text with near-perfect grammar and reasoning.

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

Purified OPSD: On-Policy Self-Distillation Without Losing How to Think

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

Purified OPSD subtracts a reference-only teacher's signal from standard OPSD supervision and applies PMI to create a cleaner distillation target, yielding gains on long-CoT models while preserving epistemic behavior.

Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

Empirical study on production-scale clinical NLP shows direct learning from verifier rejections fails due to sparse data while fixed ontology and evidence-support filters succeed, with selectivity determined by matching verifier evidence.

TallyTrain: Communication-Efficient Federated Distillation

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

TallyTrain is a hard-label distillation protocol for federated learning that uses argmax transmission and optional sparse merges to match soft-label performance at up to 1000x lower communication cost.

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.

RPM-Distill: Physiology-guided Adaptive Cross-modal Distillation for Robust Remote Physiological Measurement

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

RPM-Distill uses synchronized radar only at training time to distill spectral periodic features into a video model via adaptive per-sample gating, yielding 81% lower MAE on remote physiological measurement tasks.

Learning 1-Bit LiDAR-based Localization with Auxiliary Objective

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

BiLoc is the first binary neural network framework for 6-DoF LiDAR pose estimation that uses an auxiliary objective to adaptively regulate information retention and achieve SOTA among BNNs on large outdoor datasets.

Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.

Knowledge Cascade: Reverse Knowledge Distillation on Nonparametric Multivariate Functional Estimation

stat.ME · 2026-06-24 · unverdicted · novelty 7.0

KCas transfers student-selected smoothing parameters to full-sample teacher models via asymptotic scaling laws in smoothing splines and kernel methods, cutting computation while retaining performance guarantees.

REDI-Match: Rotation-Equivariant Distillation for Efficient and Robust Dense Matching

cs.CV · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

REDI-Match uses rotation-equivariant distillation to transfer VFM semantics into a strictly equivariant encoder plus an entropy-driven alignment module, claiming SOTA accuracy and 1.9x speed on rotation-heavy benchmarks.

Channel Location Constrains the Auditability of Subliminal Learning

cs.LG · 2026-06-20 · unverdicted · novelty 7.0

Auditability of subliminal learning is constrained by channel location, with initialization-dependent body channels allowing pre-training screens while vocabulary geometry and conditional body channels evade them.

Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization

cs.CV · 2026-06-19 · unverdicted · novelty 7.0

PAPT uses adversarial prompt tuning on diffusion models to generate domain-style images while preserving category features, claiming superior single-domain generalization performance.

S-JEPA : Soft Clustering Anchors for Self-Supervised Speech Representation Learning

cs.SD · 2026-06-17 · unverdicted · novelty 7.0

S-JEPA uses soft GMM posteriors in a JEPA framework for self-supervised speech learning, achieving lowest WER below 90M parameters without offline re-clustering.

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

SC-GRPO improves RL with verifiable rewards by multiplying GRPO gradients with self-induced per-token KL divergence, outperforming GRPO by 8.1% and DAPO by 5.9% on math, code, and agent benchmarks.

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

ZPPO improves distillation to small vision-language models by using binary and negative candidate prompts plus a replay buffer for hard questions, outperforming standard distillation and GRPO on a 31-benchmark suite with largest gains at the 0.8B scale.

Learning from the Self-future: On-policy Self-distillation for dLLMs

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

d-OPSD reframes on-policy self-distillation for dLLMs via suffix conditioning from self-generated answers and step-level supervision, outperforming RLVR and SFT on reasoning benchmarks with ~10% of the optimization steps.

citing papers explorer

Showing 50 of 178 citing papers after filters.

Emerging Properties in Self-Supervised Vision Transformers cs.CV · 2021-04-29 · conditional · none · ref 35 · internal anchor
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
RPM-Distill: Physiology-guided Adaptive Cross-modal Distillation for Robust Remote Physiological Measurement cs.CV · 2026-06-26 · unverdicted · none · ref 16 · internal anchor
RPM-Distill uses synchronized radar only at training time to distill spectral periodic features into a video model via adaptive per-sample gating, yielding 81% lower MAE on remote physiological measurement tasks.
Learning 1-Bit LiDAR-based Localization with Auxiliary Objective cs.CV · 2026-06-26 · unverdicted · none · ref 27 · internal anchor
BiLoc is the first binary neural network framework for 6-DoF LiDAR pose estimation that uses an auxiliary objective to adaptively regulate information retention and achieve SOTA among BNNs on large outdoor datasets.
Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge cs.CV · 2026-06-25 · unverdicted · none · ref 9 · internal anchor
LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.
REDI-Match: Rotation-Equivariant Distillation for Efficient and Robust Dense Matching cs.CV · 2026-06-23 · unverdicted · none · ref 19 · 2 links · internal anchor
REDI-Match uses rotation-equivariant distillation to transfer VFM semantics into a strictly equivariant encoder plus an entropy-driven alignment module, claiming SOTA accuracy and 1.9x speed on rotation-heavy benchmarks.
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization cs.CV · 2026-06-19 · unverdicted · none · ref 26 · internal anchor
PAPT uses adversarial prompt tuning on diffusion models to generate domain-style images while preserving category features, claiming superior single-domain generalization performance.
World Model Self-Distillation: Training World Models to Solve General Tasks cs.CV · 2026-06-10 · unverdicted · none · ref 25 · internal anchor
Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.
Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks cs.CV · 2026-06-09 · unverdicted · none · ref 43 · internal anchor
The paper constructs the VIBE benchmark and evaluates six visual in-context learning models on 14 datasets, 12 tasks, and 106 combinations under a unified one-shot protocol, revealing limitations and failure modes.
Ego-METAS: Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark cs.CV · 2026-05-29 · unverdicted · none · ref 23 · internal anchor
Ego-METAS is a new benchmark providing unified egocentric video data, splits, features and baselines for online multimodal temporal action segmentation under hardware-representative energy constraints.
Slimmable ConvNeXt: Width-Adaptive Inference for Efficient Multi-Device Deployment cs.CV · 2026-05-21 · unverdicted · none · ref 12 · internal anchor
Slimmable ConvNeXt adapts ConvNeXt for width-adaptive inference using LayerNorm and inverted bottlenecks, reaching 80.8% top-1 at 4.5 GMACs and outperforming HydraViT, MatFormer, and SortedNet on ImageNet-1k.
Visual-Advantage On-Policy Distillation for Vision-Language Models cs.CV · 2026-05-21 · unverdicted · none · ref 8 · internal anchor
VA-OPD improves VLM performance over standard on-policy distillation by reweighting rollouts and separating KL terms according to token-level visual advantage on math and visual benchmarks.
Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning cs.CV · 2026-05-19 · unverdicted · none · ref 29 · internal anchor
Proposes weighted aggregation of clusters and self-distillation-driven token pruning to improve both accuracy and efficiency in ViT-based visual place recognition.
Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 27 · internal anchor
Genetic programming evolves heterogeneous layer-specific scalar functions to approximate layer normalization in pre-trained ViTs, capturing 91.6% variance versus 70.2% for uniform baselines and recovering 84.25% ImageNet Top-1 accuracy after 20 epochs of adaptation.
Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning cs.CV · 2026-05-13 · unverdicted · none · ref 17 · internal anchor
SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.
DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 14 · internal anchor
DORA uses an online RL agent to adaptively merge tokens in Vision Transformers, reporting better accuracy-efficiency trade-offs than static baselines on ImageNet and OOD sets.
Tracing Like a Clinician: Anatomy-Guided Spatial Priors for Cephalometric Landmark Detection cs.CV · 2026-05-05 · conditional · none · ref 30 · internal anchor
Anatomy-guided spatial priors function as training-time regularizers to achieve 1.04 mm mean radial error in cephalometric landmark detection, with a prior matrix isolating their effect.
Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis cs.CV · 2026-05-01 · unverdicted · none · ref 13 · internal anchor
CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.
Depth Adaptive Efficient Visual Autoregressive Modeling cs.CV · 2026-04-19 · unverdicted · none · ref 28 · internal anchor
DepthVAR adaptively allocates per-token computational depth in VAR models using a cyclic rotated scheduler and dynamic layer masking to achieve 2.3-3.1x inference speedup with minimal quality loss.
Learning Robustness at Test-Time from a Non-Robust Teacher cs.CV · 2026-04-13 · unverdicted · none · ref 16 · internal anchor
A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.
Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse cs.CV · 2026-04-06 · unverdicted · none · ref 3 · internal anchor
Chorus accelerates video DiT serving up to 45% via inter-request caching reuse in a three-stage denoising strategy with token-guided attention amplification.
Training a Student Expert via Semi-Supervised Foundation Model Distillation cs.CV · 2026-04-04 · conditional · none · ref 19 · internal anchor
A semi-supervised framework distills vision foundation models into compact instance segmentation experts that outperform their teachers by up to 11.9 AP on Cityscapes and 8.6 AP on ADE20K while being 11 times smaller.
MuDD: A Multimodal Deception Detection Dataset and GSR-Guided Progressive Distillation for Non-Contact Deception Detection cs.CV · 2026-03-27 · unverdicted · none · ref 17 · internal anchor
MuDD dataset plus GSR-guided progressive distillation with dynamic routing achieves state-of-the-art non-contact deception detection and concealed-digit identification.
DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression cs.CV · 2026-03-05 · conditional · none · ref 11 · internal anchor
DARK distillation lets a 75M-parameter student model match or exceed a 427M-parameter teacher on fetal ultrasound benchmarks by transitioning from imitating to repelling non-target similarities.
Test-Time Distillation for Continual Model Adaptation cs.CV · 2025-06-03 · conditional · none · ref 15 · internal anchor
CoDiRe blends VLM and target model predictions via MSP-based weighting and Optimal Transport rectification to enable stable continual test-time adaptation, outperforming CoTTA by 10.55% on ImageNet-C at 48% of the compute cost.
OD3: Optimization-free Dataset Distillation for Object Detection cs.CV · 2025-06-02 · unverdicted · none · ref 11 · internal anchor
OD3 presents an optimization-free dataset distillation framework for object detection that reports new state-of-the-art accuracy on COCO and VOC at compression ratios from 0.25% to 5%.
Deep Multimodal Learning with Missing Modality: A Survey cs.CV · 2024-09-12 · unverdicted · none · ref 21 · internal anchor
This survey provides the first comprehensive overview of deep multimodal learning methods designed to remain robust when some input modalities are absent.
NetTailor: Tuning the Architecture, Not Just the Weights cs.CV · 2019-06-29 · unverdicted · none · ref 21 · internal anchor
NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications cs.CV · 2017-04-17 · accept · none · ref 9 · internal anchor
MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
Distilling Temporal Coherence into 2D Networks for Transrectal Ultrasound Prostate Video Segmentation cs.CV · 2026-06-30 · unverdicted · none · ref 6 · internal anchor
A Temporally Consistent Learning Framework distills temporal coherence into 2D networks for real-time TRUS prostate video segmentation via confidence-weighted consistency, dual-scale prototype alignment, and geometric pseudo-labeling.
Unveiling Transferability in Trajectory Prediction via Latent Scene Embeddings cs.CV · 2026-06-29 · unverdicted · none · ref 43 · internal anchor
Framework learns latent scene embeddings from 24 trajectory datasets to produce transferability scores that correlate with cross-dataset model performance.
Data Provenance for Image Auto-Regressive Generation cs.CV · 2026-06-22 · unverdicted · none · ref 109 · internal anchor
A post-hoc detection framework exploits generation-induced patterns in autoregressive image outputs to enable provenance tracing across multiple IAR models without altering the generation process.
Changing Modalities: Adapting Remote Sensing Models to New Satellites and Sensors cs.CV · 2026-06-22 · unverdicted · none · ref 24 · internal anchor
DeluluNet enables continued prediction under modality substitution, addition, or subsets by training a multi-modal model from a unimodal teacher and unlabeled multimodal data via modality hallucination.
Generative Relightable Avatars cs.CV · 2026-06-21 · unverdicted · none · ref 10 · internal anchor
GRA combines UV-space material optimization and physics rendering with feed-forward texture refinement and a fine-tuned video-to-video diffusion model to achieve controllable, high-detail relighting of full-body avatars.
Curvature-Adaptive Consistency Flow Matching: Autonomous Trajectory Optimization via Reinforcement Learning cs.CV · 2026-06-21 · unverdicted · none · ref 16 · internal anchor
CACFM applies RL to adaptively select critical regions in probability flow ODE trajectories for consistency distillation, yielding SOTA few-step results on FLUX and SDXL.
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance cs.CV · 2026-06-17 · unverdicted · none · ref 13 · internal anchor
Moebius introduces a compressed diffusion inpainting model using Local-λ Mix Interaction blocks and latent-space multi-granularity distillation to reach 10B-level quality with 0.22B parameters.
Visual-OPSD: Cross-Modal On-Policy Self-Distillation for Efficient Unified Multimodal Reasoning cs.CV · 2026-06-17 · unverdicted · none · ref 10 · internal anchor
Visual-OPSD distills reasoning from a privileged visual-thought teacher to a text-only student using on-policy JSD, delivering +3.40pp accuracy gain and 14.3x speedup over the generative teacher on nine benchmarks.
Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video cs.CV · 2026-06-11 · unverdicted · none · ref 44 · internal anchor
BabyMind improves forced-choice word grounding accuracy by 2.6 points over CVCL on SAYCam-S by using offline object masks, short-term tracking into object files, and prototype-space multiple-instance contrastive learning.
Multi-View In-Cabin Monitoring System for Public Transport Vehicles cs.CV · 2026-06-10 · unverdicted · none · ref 31 · internal anchor
Introduces a 9136-sample multi-view in-cabin dataset from a German city bus with RGB, depth, LiDAR, 3D annotations via pseudo-labeling, nuScenes conversion, and benchmarks on models like BEVFusion.
FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection cs.CV · 2026-06-10 · unverdicted · none · ref 9 · internal anchor
FreqKD uses strict MSE on low-frequency features and relaxed log-MSE (weight 0.1) on high-frequency features for RGB-to-IR distillation, reporting 2.4 mAP50 gain on KAIST pedestrian detection with transfer to other datasets and tasks.
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions cs.CV · 2026-06-08 · unverdicted · none · ref 20 · internal anchor
Z-Reward trains a 27B reasoning teacher VLM on score distributions via GDSO and distills it via RISD into a 9B student, reaching 89.6% and 88.6% human preference accuracy with 41.3% optimization gain over SFT baseline.
LRMIL: Efficient Low-Resolution Multiple Instance Learning via High-Resolution Knowledge Distillation for Whole Slide Image Classification cs.CV · 2026-06-05 · unverdicted · none · ref 8 · internal anchor
LRMIL employs a two-stage high-to-low resolution knowledge distillation strategy to train an efficient low-resolution MIL model for WSI classification that outperforms existing methods with lower computational cost.
Knowledge Distillation for Visual Autoregressive Models cs.CV · 2026-06-04 · unverdicted · none · ref 12 · internal anchor
VarKD is a distillation framework for visual AR models that uses student samples and selective teacher supervision to reduce token ambiguity, outperforming prior baselines on ImageNet.
ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation cs.CV · 2026-06-04 · unverdicted · none · ref 10 · internal anchor
ViCuR introduces recoverable visual cues as teacher privilege in multimodal on-policy distillation, yielding +1.19 to +1.24 average gains over answer-based baselines across seven benchmarks with Qwen3-VL students.
Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models cs.CV · 2026-06-03 · unverdicted · none · ref 12 · internal anchor
OGKD injects inter-class geometry into teacher targets for two distillation losses (GAD on global tokens, LGD on patches) and reports 1.7-2.8% average accuracy gains over prior VLM adaptation methods on 11 medical datasets.
Video-Mirai: Autoregressive Video Diffusion Models Need Foresight cs.CV · 2026-06-02 · unverdicted · none · ref 14 · internal anchor
Training method distills non-causal future targets into causal video diffusion states to boost long-horizon consistency without changing inference architecture or cost.
Pathway-Structured Privileged Distillation for Deployable Computational Pathology cs.CV · 2026-06-01 · unverdicted · none · ref 26 · internal anchor
MoPE is a privileged distillation framework that transfers RNA-derived pathway supervision to histology experts via memory-usage alignment, improving whole-slide image only inference on cancer benchmarks.
Pool-Select-Refine for Allocation-Aware Generative Dataset Distillation cs.CV · 2026-06-01 · unverdicted · none · ref 14 · internal anchor
A two-stage framework that decouples generation, selection, and refinement to improve budget use in diffusion-based dataset distillation.
Single-Channel Tissue Segmentation via Cross-Modal Distillation from Foundation Models cs.CV · 2026-05-30 · conditional · none · ref 4 · internal anchor
Cross-modal distillation from multiplexed SAM/CellSAM teachers to single-channel U-Net students yields ~13 Dice point gains on TissueNet, recovering 88% of teacher performance with 23x fewer parameters.
DiffCrossGait: Trajectory-Level Alignment for 2D-3D Cross-Modal Gait Recognition via Latent Diffusion cs.CV · 2026-05-29 · unverdicted · none · ref 91 · internal anchor
DiffCrossGait reformulates 2D-3D gait recognition as trajectory-level alignment in an identity-relevant latent diffusion space using a Tri-Phase Alignment Strategy and achieves state-of-the-art results on SUSTech1K and FreeGait.
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies cs.CV · 2026-05-28 · unverdicted · none · ref 18 · internal anchor
VISUALTHINK-VLA uses visual evidence tokens and selective routing to reach top success rates on VLA benchmarks while cutting reasoning latency from multi-second to sub-second levels.

Distilling the Knowledge in a Neural Network

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer