Enhancing cognition and explainability of multimodal foundation models with self-synthesized data.arXiv preprint arXiv:2502.14044, 2025a

Yucheng Shi, Quanzheng Li, Jin Sun, Xiang Li, Ninghao Liu · 2025 · arXiv 2502.14044

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Self-Improving Small Object Grounding in LVLMs

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

Attention maps in LVLMs enable an IoU regressor (Pearson r > 0.67) and a training-free entropy-based selector that improves small-object localization by up to 19% on COCO and Objects365.

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning

cs.CV · 2026-02-07 · unverdicted · novelty 6.0

Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.

citing papers explorer

Showing 2 of 2 citing papers.

Self-Improving Small Object Grounding in LVLMs cs.CV · 2026-06-01 · unverdicted · none · ref 29
Attention maps in LVLMs enable an IoU regressor (Pearson r > 0.67) and a training-free entropy-based selector that improves small-object localization by up to 19% on COCO and Objects365.
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning cs.CV · 2026-02-07 · unverdicted · none · ref 22
Fine-R1 uses chain-of-thought supervised fine-tuning on a structured FGVR reasoning dataset plus triplet augmented policy optimization to outperform general MLLMs and CLIP models on seen and unseen fine-grained categories with 4-shot training.

Enhancing cognition and explainability of multimodal foundation models with self-synthesized data.arXiv preprint arXiv:2502.14044, 2025a

fields

years

verdicts

representative citing papers

citing papers explorer