Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

Chi Zhang; Hongxuan Ma; Jiangyuan Liu; Wei Sui; Wei Zou; Yuhao Zhao; Yuxin Guo

arxiv: 2502.14616 · v2 · pith:PWU37SOInew · submitted 2025-02-20 · 💻 cs.CV

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

Jiangyuan Liu , Hongxuan Ma , Yuxin Guo , Yuhao Zhao , Chi Zhang , Wei Sui , Wei Zou This is my paper

classification 💻 cs.CV

keywords transparentdepthmonocularobjectsonlytaskschallengingestimation

0 comments

read the original abstract

Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input. Codes and models are publicly available at https://github.com/L-J-Yuan/MODEST.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AISPO: Enhancing Depth Reliability for Robotic Manipulation of Non-Lambertian Objects via Affine-Invariant Shape Prior
cs.RO 2026-06 unverdicted novelty 4.0

AISPO proposes a depth completion method using multi-scale RGB-D fusion and an affine-invariant shape prior to improve depth reliability and manipulation success for non-Lambertian objects.
Trans2Occ: Voxel Occupancy Estimation and Grasp for Transparent Objects from Simulation to Reality
cs.RO 2026-06 unverdicted novelty 4.0

A simulation-trained model predicts voxel occupancy from single RGB views for transparent object grasping and transfers to real robotic setups without fine-tuning.