LandSegmenter creates a task-specific foundation model for LULC mapping using weak labels from existing products, an RS adapter, text encoder, and confidence-guided fusion to achieve competitive zero-shot performance across modalities and taxonomies.
Remoteclip: A vision language foundation model for remote sensing
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.
ChatENV fine-tunes Qwen-2.5-VL on a 177k-image dataset of temporal satellite pairs with sensor metadata to support interactive temporal and what-if reasoning for environmental monitoring.
citing papers explorer
-
LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping
LandSegmenter creates a task-specific foundation model for LULC mapping using weak labels from existing products, an RS adapter, text encoder, and confidence-guided fusion to achieve competitive zero-shot performance across modalities and taxonomies.
-
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery
Lightweight multimodal projector alignment transfers RGB VLMs to thermal drone imagery, achieving F1 scores of 0.915-0.968 for deer, rhino, and elephant recognition plus high enumeration accuracy and habitat context interpretation on a real drone dataset.
-
ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation
ChatENV fine-tunes Qwen-2.5-VL on a 177k-image dataset of temporal satellite pairs with sensor metadata to support interactive temporal and what-if reasoning for environmental monitoring.