International Journal of Computer Vision (IJCV) , year=

Learning to Prompt for Vision-Language Models , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

cs.CV · 2023-12-28 · conditional · novelty 7.0

Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.

LLaVA-Video: Video Instruction Tuning With Synthetic Data

cs.CV · 2024-10-03 · unverdicted · novelty 6.0

LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.

Text-Guided Multi-Scale Frequency Representation Adaptation

cs.CV · 2026-05-05 · unverdicted · novelty 5.0

FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.

From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

VLMs recover reliable population-level trends in climate change visual discourse on social media even when per-image accuracy is only moderate.

citing papers explorer

Showing 4 of 4 citing papers.

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels cs.CV · 2023-12-28 · conditional · none · ref 166
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
LLaVA-Video: Video Instruction Tuning With Synthetic Data cs.CV · 2024-10-03 · unverdicted · none · ref 86
LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.
Text-Guided Multi-Scale Frequency Representation Adaptation cs.CV · 2026-05-05 · unverdicted · none · ref 13
FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.
From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media cs.CV · 2026-04-23 · unverdicted · none · ref 4
VLMs recover reliable population-level trends in climate change visual discourse on social media even when per-image accuracy is only moderate.

International Journal of Computer Vision (IJCV) , year=

fields

years

verdicts

representative citing papers

citing papers explorer