The color of the cat is gray: 1 million full-sentences visual question answering (FSVQA)

Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada · 2016 · cs.CV · arXiv 1609.06657

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Visual Question Answering (VQA) task has showcased a new stage of interaction between language and vision, two of the most pivotal components of artificial intelligence. However, it has mostly focused on generating short and repetitive answers, mostly single words, which fall short of rich linguistic capabilities of humans. We introduce Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset. This poses many additional complexities to conventional VQA task, and we provide a baseline for approaching and evaluating the task, on top of which we invite the research community to build further improvements.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Brain-IT-VQA: From Brain Signals to Answers

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

Brain-IT-VQA decodes visual question answers from fMRI using a transformer to extract language tokens and introduces the NSD-VQA benchmark with 20 controlled questions per image across 20 categories.

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

cs.CV · 2024-08-03 · conditional · novelty 5.0

MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

MiniCPM-V: A GPT-4V Level MLLM on Your Phone cs.CV · 2024-08-03 · conditional · none · ref 90
MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.

The color of the cat is gray: 1 million full-sentences visual question answering (FSVQA)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer