MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.
citing papers explorer
-
Image Captioning via Compact Bidirectional Architecture
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.