hub

Large Scale Distributed Neural Network Training through Online Distillation

· 2018 · arXiv 1804.03235

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.

Enabling Federated Inference via Unsupervised Consensus Embedding

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.

LatentBurst: A Fast and Efficient Multi Frame Super-Resolution for Hexadeca-Bayer Pattern CIS images

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models

cs.LG · 2025-07-25 · unverdicted · novelty 6.0

Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

FedKDNAS combines client-side neural architecture search with knowledge distillation from aggregated server predictions to improve accuracy and efficiency in heterogeneous federated learning.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

cs.CL · 2025-07-07 · unverdicted · novelty 4.0

Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.

Gemma 3 Technical Report

cs.CL · 2025-03-25 · accept · novelty 4.0

Gemma 3 introduces multimodal open models with architectural changes for efficient long context, trained via distillation and a new post-training recipe that makes the 4B version competitive with prior 27B models and the 27B version comparable to Gemini-1.5-Pro.

Gemma 2: Improving Open Language Models at a Practical Size

cs.CL · 2024-07-31 · conditional · novelty 3.0

Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

citing papers explorer

Showing 11 of 11 citing papers.

Emerging Properties in Self-Supervised Vision Transformers cs.CV · 2021-04-29 · conditional · none · ref 1
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective cs.LG · 2026-05-10 · unverdicted · none · ref 22
FedF-ADMM uses function-space ADMM updates projected via knowledge distillation plus a PI-like stabilization term to deliver faster, more stable convergence and higher accuracy than prior decentralized FL methods under severe non-IID conditions.
Enabling Federated Inference via Unsupervised Consensus Embedding cs.LG · 2026-05-07 · unverdicted · none · ref 17
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
LatentBurst: A Fast and Efficient Multi Frame Super-Resolution for Hexadeca-Bayer Pattern CIS images cs.CV · 2026-04-25 · unverdicted · none · ref 25
LatentBurst is a new multi-frame super-resolution network for hexadeca-Bayer CIS images that uses pyramid latent alignment, an efficient UNet, and two-step knowledge distillation to handle motion and run on mobile devices.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 178
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models cs.LG · 2025-07-25 · unverdicted · none · ref 6
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 182
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search cs.LG · 2026-05-20 · unverdicted · none · ref 35
FedKDNAS combines client-side neural architecture search with knowledge distillation from aggregated server predictions to improve accuracy and efficiency in heterogeneous federated learning.
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025-07-07 · unverdicted · none · ref 1
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
Gemma 3 Technical Report cs.CL · 2025-03-25 · accept · none · ref 2
Gemma 3 introduces multimodal open models with architectural changes for efficient long context, trained via distillation and a new post-training recipe that makes the 4B version competitive with prior 27B models and the 27B version comparable to Gemini-1.5-Pro.
Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 159
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

Large Scale Distributed Neural Network Training through Online Distillation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer