VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.
Visual instruction tuning, 2023
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LongVideo-R1 trains a reasoning agent on 33K trajectories to intelligently select informative video clips via iterative refinement and RL, achieving better accuracy-efficiency tradeoffs on long video QA benchmarks.
Consensus Entropy measures inter-VLM output agreement to verify OCR reliability and enable self-improving ensembles, yielding 42.1% F1 gains over single-model judging.
OVOD-Agent models visual reasoning as a weakly Markovian decision process with bandit-driven exploration to create a self-evolving open-vocabulary detector that improves on rare categories in COCO and LVIS.
citing papers explorer
-
VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing
VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.
-
LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
LongVideo-R1 trains a reasoning agent on 33K trajectories to intelligently select informative video clips via iterative refinement and RL, achieving better accuracy-efficiency tradeoffs on long video QA benchmarks.
-
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Consensus Entropy measures inter-VLM output agreement to verify OCR reliability and enable self-improving ensembles, yielding 42.1% F1 gains over single-model judging.
-
OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection
OVOD-Agent models visual reasoning as a weakly Markovian decision process with bandit-driven exploration to create a self-evolving open-vocabulary detector that improves on rare categories in COCO and LVIS.