ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Hengshuang Zhao; Jianping Shi; Jiaya Jia; Xiaojuan Qi; Xiaoyong Shen

arxiv: 1704.08545 · v2 · pith:AX2K4ZNUnew · submitted 2017-04-27 · 💻 cs.CV

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Hengshuang Zhao , Xiaojuan Qi , Xiaoyong Shen , Jianping Shi , Jiaya Jia This is my paper

classification 💻 cs.CV

keywords real-timesegmentationcascadechallengingicnetinferencelabelsemantic

0 comments

read the original abstract

We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Separable Convolutional LSTMs for Faster Video Segmentation
cs.CV 2019-07 unverdicted novelty 6.0

Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.
Efficient Segmentation: Learning Downsampling Near Semantic Boundaries
cs.CV 2019-07 unverdicted novelty 4.0

Learned adaptive downsampling for semantic segmentation that prioritizes locations near semantic boundaries to improve the accuracy-efficiency trade-off over uniform sampling.
ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation
cs.CV 2019-06 unverdicted novelty 4.0

ESNet is a lightweight symmetric CNN using factorized residual units and parallel dilated convolutions that reaches over 62 FPS semantic segmentation on Cityscapes with 1.6M parameters.