pith. sign in

Answer-me: Multi-task open-vocabulary visual question answering

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CV 2 cs.AI 1

years

2024 1 2022 2

representative citing papers

PaLI: A Jointly-Scaled Multilingual Language-Image Model

cs.CV · 2022-09-14 · conditional · novelty 7.0

PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.

citing papers explorer

Showing 3 of 3 citing papers.

  • PaLI: A Jointly-Scaled Multilingual Language-Image Model cs.CV · 2022-09-14 · conditional · none · ref 55

    PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.

  • CoCa: Contrastive Captioners are Image-Text Foundation Models cs.CV · 2022-05-04 · accept · none · ref 34

    CoCa unifies contrastive and generative pretraining in one image-text model to reach 86.3% zero-shot ImageNet accuracy and new state-of-the-art results on multiple downstream benchmarks.

  • ChatSR: Multimodal Large Language Models for Scientific Formula Discovery cs.AI · 2024-06-08 · unverdicted · none · ref 15

    ChatSR aligns scientific data encoders with LLMs to produce formulas that fit data and satisfy explicit priors, reporting SOTA results on 13 symbolic regression benchmarks plus zero-shot handling of unseen prior types.