A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
A Simple Framework for Contrastive Learn- ing of Visual Representations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
DINOv3 pretraining yields no frozen advantage and underperforms ImageNet on X-ray but improves convergence and final performance after full finetuning on RGB industrial inspection tasks.
citing papers explorer
-
Grounding Hierarchical Vision-Language-Action Models Through Explicit Language-Action Alignment
A contrastive alignment model plus offline preference learning explicitly grounds hierarchical VLA language descriptions to actions and visuals on LanguageTable, achieving performance comparable to fully supervised fine-tuning while reducing annotation needs.
-
Rethinking Transfer Learning for Industrial Inspection: DINOv3 vs. ImageNet Pretraining Across RGB and X-ray Tasks
DINOv3 pretraining yields no frozen advantage and underperforms ImageNet on X-ray but improves convergence and final performance after full finetuning on RGB industrial inspection tasks.