pith. sign in

arxiv: 1610.02242 · v3 · pith:NZJQPK66new · submitted 2016-10-07 · 💻 cs.NE · cs.LG

Temporal Ensembling for Semi-Supervised Learning

classification 💻 cs.NE cs.LG
keywords labelstrainingsemi-supervisedclassificationdifferentimageslearningmethod
0
0 comments X
read the original abstract

In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revisiting Shadow Detection from a Vision-Language Perspective

    cs.CV 2026-05 unverdicted novelty 7.0

    SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.

  2. Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

    cs.LG 2026-04 unverdicted novelty 6.0

    Any supervised encoder must retain sensitivity along label-correlated directions, unifying non-robust features, texture bias, corruption fragility, and the robustness-accuracy tradeoff, and this is measurable and part...

  3. Revisiting Feature Prediction for Learning Visual Representations from Video

    cs.CV 2024-02 conditional novelty 6.0

    V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.

  4. Text-Video Retrieval With Global-Local Contrastive Consistency Learning

    cs.IR 2026-05 unverdicted novelty 5.0

    GLCCL uses a Global-Local Interaction Module and Contrastive Score Consistency loss to align text and video semantics more efficiently than attention-based methods on MSR-VTT, DiDeMo, and VATEX.

  5. Information theoretic underpinning of self-supervised learning by clustering

    cs.LG 2026-05 unverdicted novelty 5.0

    SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

  6. ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision

    cs.CV 2026-05 unverdicted novelty 5.0

    ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.

  7. PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning

    cs.CV 2024-09 unverdicted novelty 4.0

    PEPL refines pseudo-labels via CAM-based semantic estimation in two phases to reach state-of-the-art accuracy in semi-supervised fine-grained image classification.

  8. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasksks, Multi Task Learning, Semi-Supervised Learning

    cs.CV 2019-07 unverdicted novelty 4.0

    Proposes Distill-2MD-MTL, an MTL-based data distillation framework for semi-supervised multi-domain face analysis tasks that claims better performance than single-task baselines.