FTibSuite provides human-verified multimodal corpora, Tibetan-adapted benchmarks with quality controls, and a baseline VLM showing gains on tasks like MMBench while preserving Chinese capabilities.
AI Challenger: A large-scale dataset for going deeper in image understanding
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
dataset 1polarities
use dataset 1representative citing papers
MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.
Again-Pose improves 3D human pose reconstruction from degraded videos by using anchor frames to guide adaptive propagation of inter-frame motion cues via dual-path and fusion modules.
citing papers explorer
-
FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling
FTibSuite provides human-verified multimodal corpora, Tibetan-adapted benchmarks with quality controls, and a baseline VLM showing gains on tasks like MMBench while preserving Chinese capabilities.
-
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.
-
Again-Pose: Anchor-Guided Adaptive Inter-Frame Motion Cues Propagating for High-quality Human Pose Reconstruction
Again-Pose improves 3D human pose reconstruction from degraded videos by using anchor frames to guide adaptive propagation of inter-frame motion cues via dual-path and fusion modules.