Dynamicvl: Benchmarking multimodal large language models for dynamic city understanding

DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding , author= · 2025 · arXiv 2505.21076

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 2

citation-polarity summary

baseline 2

representative citing papers

SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.

Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model

eess.IV · 2026-05-11 · unverdicted · novelty 6.0

Introduces the SMART-HC-VQA dataset with 65k single-image and 2.3M temporal VQA examples plus an adapted LLaVA-NeXT MLLM framework for geospatial-temporal sensemaking of remote sensing construction activity.

Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

cs.CV · 2026-06-10 · unverdicted · novelty 4.0

UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.

citing papers explorer

Showing 4 of 4 citing papers.

SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models cs.CV · 2026-05-11 · unverdicted · none · ref 24
SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model eess.IV · 2026-05-11 · unverdicted · none · ref 4
Introduces the SMART-HC-VQA dataset with 65k single-image and 2.3M temporal VQA examples plus an adapted LLaVA-NeXT MLLM framework for geospatial-temporal sensemaking of remote sensing construction activity.
Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models cs.CV · 2026-04-15 · unverdicted · none · ref 48
Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.
UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA cs.CV · 2026-06-10 · unverdicted · none · ref 122
UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.

Dynamicvl: Benchmarking multimodal large language models for dynamic city understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer