MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
Mmm-rs: A multi-modal, multi-gsd, multi-scene remote sensing dataset and benchmark for text-to-image generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
GeoR-Bench shows top multimodal models reach only 42.7% strict accuracy on geoscience visual reasoning tasks while open-source models reach 10.3%, with outputs often visually plausible yet scientifically inaccurate.
citing papers explorer
-
MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
-
GeoR-Bench: Evaluating Geoscience Visual Reasoning
GeoR-Bench shows top multimodal models reach only 42.7% strict accuracy on geoscience visual reasoning tasks while open-source models reach 10.3%, with outputs often visually plausible yet scientifically inaccurate.