Visual debiasing of omni-modal benchmarks combined with staged post-training lets a 3B model match or exceed a 30B model without a stronger teacher.
Sdrt: Enhance vision- language models by self-distillation with diverse reasoning traces
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
years
2026 2roles
method 1polarities
use method 1representative citing papers
citing papers explorer
-
Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation
Visual debiasing of omni-modal benchmarks combined with staged post-training lets a 3B model match or exceed a 30B model without a stronger teacher.
- Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation