SSL-R1 reformulates visual SSL tasks into verifiable puzzles to supply rewards for RL post-training of MLLMs, yielding gains on multimodal benchmarks without external supervision.
ibot: Image bert pre-training with online tokenizer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MultiModalPFN extends TabPFN with modality projectors, a multi-head gated MLP, and cross-attention pooler to unify tabular and non-tabular inputs, outperforming prior methods on medical and general multimodal datasets.
citing papers explorer
-
SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models
SSL-R1 reformulates visual SSL tasks into verifiable puzzles to supply rewards for RL post-training of MLLMs, yielding gains on multimodal benchmarks without external supervision.
-
MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning
MultiModalPFN extends TabPFN with modality projectors, a multi-head gated MLP, and cross-attention pooler to unify tabular and non-tabular inputs, outperforming prior methods on medical and general multimodal datasets.