OCRBench provides the largest evaluation suite yet for OCR capabilities in large multimodal models, revealing gaps in multilingual, handwritten, and mathematical text handling.
Xtuner: A toolkit for efficiently fine-tuning llm
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
Empirical tests of VLM-CBMs show VLM supervision differs from expert annotations depending on task and that concept accuracy correlates weakly with quality metrics.
citing papers explorer
-
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models
OCRBench provides the largest evaluation suite yet for OCR capabilities in large multimodal models, revealing gaps in multilingual, handwritten, and mathematical text handling.
-
If Concept Bottlenecks are the Question, are Foundation Models the Answer?
Empirical tests of VLM-CBMs show VLM supervision differs from expert annotations depending on task and that concept accuracy correlates weakly with quality metrics.