pith. sign in

Grit: A generative region-to-text transformer for object understanding

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

method 2 background 1

citation-polarity summary

fields

cs.CV 3 cs.CL 1

years

2026 1 2023 3

clear filters

representative citing papers

VideoChat: Chat-Centric Video Understanding

cs.CV · 2023-05-10 · conditional · novelty 7.0

VideoChat integrates video models and LLMs via a learnable interface for chat-based spatiotemporal and causal video reasoning, trained on a new video-centric instruction dataset.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • VideoChat: Chat-Centric Video Understanding cs.CV · 2023-05-10 · conditional · none · ref 48

    VideoChat integrates video models and LLMs via a learnable interface for chat-based spatiotemporal and causal video reasoning, trained on a new video-centric instruction dataset.

  • The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) cs.CV · 2023-09-29 · conditional · none · ref 138

    GPT-4V processes interleaved image-text inputs generically and supports visual referring prompting for new human-AI interaction.