MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.
citing papers explorer
-
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
-
Routing-Based Continual Learning for Multimodal Large Language Models
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
-
Image Captioning via Compact Bidirectional Architecture
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.