SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
citing papers explorer
-
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
-
A Simple Framework for Contrastive Learning of Visual Representations
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
-
A Semi-Supervised Framework for Speech Confidence Detection using Whisper
A hybrid semi-supervised framework fusing Whisper embeddings with acoustic and prosodic features achieves 0.751 Macro-F1 for speaker confidence detection and outperforms baselines including WavLM, HuBERT, and Wav2Vec 2.0.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.