pith. sign in

arxiv: 2502.14616 · v2 · pith:PWU37SOInew · submitted 2025-02-20 · 💻 cs.CV

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

classification 💻 cs.CV
keywords transparentdepthmonocularobjectsonlytaskschallengingestimation
0
0 comments X
read the original abstract

Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input. Codes and models are publicly available at https://github.com/L-J-Yuan/MODEST.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AISPO: Enhancing Depth Reliability for Robotic Manipulation of Non-Lambertian Objects via Affine-Invariant Shape Prior

    cs.RO 2026-06 unverdicted novelty 4.0

    AISPO proposes a depth completion method using multi-scale RGB-D fusion and an affine-invariant shape prior to improve depth reliability and manipulation success for non-Lambertian objects.

  2. Trans2Occ: Voxel Occupancy Estimation and Grasp for Transparent Objects from Simulation to Reality

    cs.RO 2026-06 unverdicted novelty 4.0

    A simulation-trained model predicts voxel occupancy from single RGB views for transparent object grasping and transfers to real robotic setups without fine-tuning.