A pre-fusion calibration module modulates multimodal features using cross-modality support and conflict cues to improve performance on five benchmarks including sentiment analysis and audio-visual tasks.
Uni-x: Mitigating modality conflict with a two-end- separated architecture for unified multimodal models.arXiv preprint arXiv:2509.24365
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Tuna-2 shows that direct pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive generation and stronger understanding at scale.
citing papers explorer
-
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Tuna-2 shows that direct pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive generation and stronger understanding at scale.