pith. machine review for the scientific record. sign in

hub

Ocrbench: On the hidden mystery of ocr in large multimodal models

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

dataset 2

citation-polarity summary

fields

cs.CV 9 cs.AI 1

roles

dataset 2

polarities

use dataset 2

representative citing papers

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

cs.CV · 2024-08-03 · conditional · novelty 5.0

MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.

citing papers explorer

Showing 10 of 10 citing papers.