Remoteclip: A vision language foundation model for remote sensing

· 2024 · arXiv 2024.339083

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping

cs.CV · 2025-11-11 · unverdicted · novelty 6.0

LandSegmenter creates a task-specific foundation model for LULC mapping using weak labels from existing products, an RS adapter, text encoder, and confidence-guided fusion to achieve competitive zero-shot performance across modalities and taxonomies.

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

cs.CV · 2025-08-14 · unverdicted · novelty 5.0

ChatENV fine-tunes Qwen-2.5-VL on a 177k-image dataset of temporal satellite pairs with sensor metadata to support interactive temporal and what-if reasoning for environmental monitoring.

citing papers explorer

Showing 3 of 3 citing papers.

LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping cs.CV · 2025-11-11 · unverdicted · none · ref 4
LandSegmenter creates a task-specific foundation model for LULC mapping using weak labels from existing products, an RS adapter, text encoder, and confidence-guided fusion to achieve competitive zero-shot performance across modalities and taxonomies.
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery cs.CV · 2026-04-07 · unverdicted · none · ref 16
Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.
ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation cs.CV · 2025-08-14 · unverdicted · none · ref 25
ChatENV fine-tunes Qwen-2.5-VL on a 177k-image dataset of temporal satellite pairs with sensor metadata to support interactive temporal and what-if reasoning for environmental monitoring.

Remoteclip: A vision language foundation model for remote sensing

fields

years

verdicts

representative citing papers

citing papers explorer