pith. sign in

hub Canonical reference

mplug-docowl: Modularized multimodal large language model for document understanding

Canonical reference. 71% of citing Pith papers cite this work as background.

19 Pith papers citing it
Background 71% of classified citations

hub tools

citation-role summary

background 5 baseline 1 method 1

citation-polarity summary

representative citing papers

LLM Agents Can See Code Repositories

cs.SE · 2026-06-12 · unverdicted · novelty 7.0

Visual graphs of repository structure added to text inputs for multimodal LLM agents reduce token consumption by up to 26% while maintaining or improving issue-resolution accuracy.

InstructTable: Improving Table Structure Recognition Through Instructions

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

InstructTable combines instruction-guided pre-training on structural patterns with visual fine-tuning and a template-free synthetic data generator (TME) to reach state-of-the-art table structure recognition on public benchmarks and a new complex-table test set.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

citing papers explorer

Showing 19 of 19 citing papers.