The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.
arXiv preprint arXiv:2312.17240 (2023)
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Tarot-SAM3 delivers a training-free pipeline for segmenting images from arbitrary referring expressions via expression reasoning prompts and DINOv3-based mask self-refinement.
citing papers explorer
-
PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation
The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.
-
Online Reasoning Video Object Segmentation
The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
Tarot-SAM3: Training-free SAM3 for Any Referring Expression Segmentation
Tarot-SAM3 delivers a training-free pipeline for segmenting images from arbitrary referring expressions via expression reasoning prompts and DINOv3-based mask self-refinement.