ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation

Quan Zhou; Xiaofu Wu; Yu Wang

arxiv: 1906.09826 · v1 · pith:NULKT44Ynew · submitted 2019-06-24 · 💻 cs.CV

ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation

Yu Wang , Quan Zhou , Xiaofu Wu This is my paper

Pith reviewed 2026-05-25 17:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords real-time semantic segmentationefficient CNNCityscapesfactorized convolutionssymmetric networkdeep learningsemantic segmentation

0 comments

The pith

ESNet's symmetric design of factorized and parallel convolution units enables real-time semantic segmentation with only 1.6 million parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ESNet as a solution to the high computational cost of semantic segmentation networks by building a nearly symmetric architecture around factorized convolution units. These units incorporate 1D factorized convolutions in residuals, while parallel versions split paths to apply dilated convolutions at varying rates before merging. If effective, this would allow accurate segmentation to run at over 62 frames per second on standard GPUs with minimal memory footprint, making it practical for embedded or real-time systems. The experiments on Cityscapes are meant to show this architecture improves the speed-accuracy frontier compared to prior real-time methods.

Core claim

ESNet consists of a series of factorized convolution units and parallel factorized convolution units that together form a symmetric network. This design achieves state-of-the-art results in the speed and accuracy trade-off for real-time semantic segmentation on the Cityscapes dataset, with the model having nearly 1.6 million parameters and running at over 62 FPS on a GTX 1080Ti GPU.

What carries the argument

The factorized convolution unit (FCU) and parallel factorized convolution unit (PFCU), where PFCU uses a transform-split-transform-merge strategy with dilated convolutions.

If this is right

The low parameter count supports deployment in resource-constrained environments.
Real-time performance above 60 FPS enables applications in video analysis and autonomous systems.
The symmetric structure maintains segmentation accuracy without excessive computation.
Results on Cityscapes suggest the design competes favorably with existing real-time segmentation models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the symmetric factorized approach to other tasks like instance segmentation could yield similar efficiency gains.
Evaluating the model on diverse hardware platforms would clarify its portability beyond the reported GTX 1080Ti setup.
The use of dilated convolutions in parallel branches may offer insights for receptive field design in other efficient architectures.

Load-bearing premise

The assumption that performance on Cityscapes validation and test sets with one hardware setup sufficiently demonstrates superiority for real-time semantic segmentation in general.

What would settle it

Demonstrating on another dataset or hardware that an alternative network achieves a superior combination of accuracy and frames per second.

Figures

Figures reproduced from arXiv: 1906.09826 by Quan Zhou, Xiaofu Wu, Yu Wang.

**Figure 1.** Figure 1: Overall symmetric architecture of the proposed ESNet. The entire network is composed by four components: down-sampling unit, upsampling unit, factorized convolution unit and its parallel version. (Best viewed in color) convolution stride significantly reduce the dimension of feature representation, thereby losing much of the finer image structure. In order to address this problem, a more deeper architectu… view at source ↗

**Figure 2.** Figure 2: Comparison of different residual layer modules. From left to right are (a) Nonbottleneck [2], (b) Bottleneck [17], (c) Non-bottleneck-1D [19], (d) FCU and (e) PFCU module. “DConv” denotes the dilated convolution, where r1, r2, and r3 are dilated rates for each split branch, respectively. are employed, where the first one uses factorized convolution to extract low-level features, and the second one utilize… view at source ↗

**Figure 3.** Figure 3: The visual comparison on CityScapes val dataset. From left to right are input images, ground truth, segmentation outputs from our ESNet, SegNet [8], ENet [17], ERFNet [19], ESPNet [18], ICNet [35], and CGNet [15]. (Best viewed in color) networks, we propose an ESNet that completely leverages its benefits to reach state-of-the-art segmentation accuracy and efficiency. The experimental results show that our … view at source ↗

read the original abstract

The recent years have witnessed great advances for semantic segmentation using deep convolutional neural networks (DCNNs). However, a large number of convolutional layers and feature channels lead to semantic segmentation as a computationally heavy task, which is disadvantage to the scenario with limited resources. In this paper, we design an efficient symmetric network, called (ESNet), to address this problem. The whole network has nearly symmetric architecture, which is mainly composed of a series of factorized convolution unit (FCU) and its parallel counterparts (PFCU). On one hand, the FCU adopts a widely-used 1D factorized convolution in residual layers. On the other hand, the parallel version employs a transform-split-transform-merge strategy in the designment of residual module, where the split branch adopts dilated convolutions with different rate to enlarge receptive field. Our model has nearly 1.6M parameters, and is able to be performed over 62 FPS on a single GTX 1080Ti GPU. The experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off for real-time semantic segmentation on CityScapes dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ESNet assembles existing factorized and dilated conv tricks into a symmetric low-param net and claims a useful speed-accuracy point on Cityscapes, but the numbers need the tables checked.

read the letter

ESNet is an incremental but practical engineering paper that combines factorized convolutions and parallel dilated branches into a symmetric network for real-time semantic segmentation on Cityscapes. The main contribution is the specific design of the FCU and PFCU blocks, where the parallel version uses a split with different dilation rates to expand the receptive field while keeping the parameter count low at 1.6M. This lets it hit over 62 FPS on a 1080Ti. The symmetric architecture is a reasonable choice for balancing encoder and decoder. It does a decent job of addressing the resource constraint problem with a clear description of the residual modules. The transform-split-transform-merge strategy in PFCU is a straightforward way to add multi-scale context without too much overhead. The paper cites prior work on factorized conv and dilated conv appropriately. The soft spots are that the central performance claims rest on experimental results we can't see in this review. The abstract-only view makes it hard to judge if the comparisons are fair or if ablations support the design choices. That said, the claim is scoped to Cityscapes, so the lack of other datasets isn't a big issue for the stated contribution. The dilation rates are free parameters that probably required some search. This is for practitioners who need a lightweight model for embedded real-time segmentation. Someone looking for theoretical advances or new operators won't find it here. It deserves a serious referee to verify the experimental claims, because if the numbers are accurate it provides a useful recipe even if the ideas are assembled from prior work. I'd bring it to a reading group if the group is focused on efficient vision models.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes ESNet, a symmetric encoder-decoder network for real-time semantic segmentation. The architecture is built from factorized convolution units (FCU) that apply 1D factorized convolutions within residual blocks and parallel factorized convolution units (PFCU) that follow a transform-split-transform-merge pattern using dilated convolutions at multiple rates to expand receptive fields. The authors state that the resulting model contains approximately 1.6 million parameters and achieves more than 62 FPS on a GTX 1080Ti GPU while attaining state-of-the-art speed-accuracy trade-off on the Cityscapes dataset.

Significance. If the reported Cityscapes results hold, the work supplies a concrete, low-parameter architecture that improves the speed-accuracy frontier for real-time segmentation. The explicit parameter count and single-GPU FPS figure, together with the modular FCU/PFCU design, constitute a reproducible engineering contribution that can be directly compared against other lightweight segmentation models.

minor comments (2)

[Abstract] Abstract: the claim of 'state-of-the-art results' is not accompanied by any numerical accuracy metric (e.g., mIoU); adding the key quantitative numbers would allow readers to evaluate the trade-off immediately.
[Abstract] The description of the PFCU 'transform-split-transform-merge' strategy is terse; a short diagram or one-sentence statement of how the split branches are recombined before the residual addition would remove ambiguity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of ESNet, the recognition of its engineering contribution, and the recommendation for minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical architecture proposal describing FCU/PFCU residual modules for semantic segmentation. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims rest on external experimental benchmarks (Cityscapes FPS/accuracy) rather than any self-referential reduction. Self-citations, if present, are not load-bearing for any central result.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The design rests on standard CNN assumptions (residual connections aid optimization, dilated convolutions enlarge receptive field without extra parameters) and on empirical choices of dilation rates and split ratios that are not quantified in the abstract.

free parameters (2)

dilation rates in PFCU
Multiple dilation rates are chosen to enlarge receptive field; exact values and selection method are not stated.
channel widths and block counts
Architecture hyperparameters that determine the 1.6 M parameter count are not listed.

axioms (2)

domain assumption Factorized 1D convolutions preserve sufficient representational power for segmentation
Invoked when the FCU is presented as a drop-in replacement for standard convolutions.
domain assumption Transform-split-transform-merge with parallel dilated branches improves accuracy without harming speed
Core justification for the PFCU design.

pith-pipeline@v0.9.0 · 5727 in / 1526 out tokens · 30157 ms · 2026-05-25T17:48:01.349550+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 9 internal anchors

[1]

In: NIPS

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In: NIPS. (2012) 1097–1105

work page 2012
[2]

In: CVPR

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016) 770–778

work page 2016
[3]

In: CVPR

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu- rate object detection and semantic segmentation. In: CVPR. (2014) 580–587

work page 2014
[4]

IEEE TPAMI 39 (2017) 640–651

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI 39 (2017) 640–651

work page 2017
[5]

IEEE TPAMI 40 (2018) 834–848 ESNet for Real-time Semantic Segmentation 11

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI 40 (2018) 834–848 ESNet for Real-time Semantic Segmentation 11

work page 2018
[6]

In: CVPR

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR. (2016) 6230–6239

work page 2016
[7]

In: CVPR

Xiaoxiao, L., Zhiwei, L., Ping, L., Chenchange, L., Xiaoou, T.: Not all pixels are equal: Diﬃculty-aware semantic segmentation via deep layer cascade. In: CVPR. (2017) 6459–6468

work page 2017
[8]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Badrinarayanan, V., Alex, K., Roberto, C.: Segnet: A deep convolutional encoder- decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

In: CVPR

Guosheng, L., Anton, M., Chunhua, S., Reid, I.: Reﬁnenet: multi-path reﬁnement networks for high-resolution semantic segmentation. In: CVPR. (2017) 5168–5177

work page 2017
[10]

In: ICCV

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmen- tation. In: ICCV. (2015) 1520–1528

work page 2015
[11]

In: CVPR

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR. (2015) 1–9

work page 2015
[12]

In: CVPR

Peng, C., Xiangyu, Z., Gang, Y., Guiming, L., Jian, S.: Large kernel matters: Improve semantic segmentation by global convolutional network. In: CVPR. (2017) 1743–1751

work page 2017
[13]

IEEE TPAMI 40 (2018) 1352–1366

Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI 40 (2018) 1352–1366

work page 2018
[14]

In: ICASSP

Cong, D., Zhou, Q., Chen, J., Wu, X., Zhang, S., Ou, W., Lu, H.: Can: Contextual aggregating network for semantic segmentation. In: ICASSP. (2019) accepted

work page 2019
[15]

CGNet: A Light-weight Context Guided Network for Semantic Segmentation

Wu, T.Y., Tang, S., Zhang, R., Zhang, Y.D.: Cgnet: A light-weight context guided network for semantic segmentation. In: arXiv preprint arXiv:1811.08201v1. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

In: NIPS Workshop

Treml, M., Arjona-Medina, J., Mayr, A., Heusel, M., Widrich, M., Bodenhofer, U., Nessler, B., Hochreiter, S.: Speeding up semantic segmentation for autonomous driving. In: NIPS Workshop. (2016) 1–7

work page 2016
[17]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. In: arXiv preprint arXiv:1606.02147. (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: Espnet: Eﬃ- cient spatial pyramid of dilated convolutions for semantic segmentation. In: arXiv preprint arXiv:1803.06815v3. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

IEEE TITS 19 (2018) 263–272

Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19 (2018) 263–272

work page 2018
[20]

In: CVPR

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR. (2016) 3213–3223

work page 2016
[21]

IEEE TPAMI 35 (2013) 1915–1929

Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35 (2013) 1915–1929

work page 2013
[22]

In: WACV

Panqu, W., Pengfei, C., Ye, Y., Ding, L., Zehua, H., Xiaodi, H., Cottrell, G.: Understanding convolution for semantic segmentation. In: WACV. (2018) 1451– 1460

work page 2018
[23]

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh, C., George, P., F., S., H., A.: Rethinking atrous convolution for semantic image segmentation. In: arXiv:1706.05587. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Multi-Scale Context Aggregation by Dilated Convolutions

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

In: MICCAI

Ronneberger, O., Philipp, F., Thomas, B.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. (2015) 225–233 12 Y. Wang et al

work page 2015
[26]

In: CVPR

Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR. (2017) 3309–3318

work page 2017
[27]

In: CVPR

Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback reﬁnement network for dense image labeling. In: CVPR. (2017) 4877–4885

work page 2017
[28]

IJCV 111 (2015) 98–136

Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. IJCV 111 (2015) 98–136

work page 2015
[29]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., W.Wang, Weyand, T., An- dreetto, M., Adam, H.: Mobilenets: eﬃcient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:1704.04861. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

In: ECCV

Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classiﬁ- cation using binary convolutional neural networks. In: ECCV. (2016)

work page 2016
[31]

In: CVPR

Zhang, X., Zhou, X., Lin, M., Sun, J.: Shuﬄenet: An extremely eﬃcient convolu- tional neural network for mobile devices. In: CVPR. (2018) 6848–6856

work page 2018
[32]

In: CVPR

Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR. (2016) 5168–5177

work page 2016
[33]

In: CVPR

Xie, X., Girshick, R., Dollar, P., Tu, Z.W., He, K.M.: Aggregated residual trans- formations for deep neural networks. In: CVPR. (2017) 5987–5995

work page 2017
[34]

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Changqian, Y., Jingbo, W., Chao, P., Changxin, G., Gang, Y., Nong, S.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: arXiv preprint arXiv:1808.00897. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: Icnet for real-time semantic segmentation on high-resolution images. In: arXiv preprint arXiv:1704.08545v2. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

In: CVPR

Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z.: Rethinking the incep- tion architecture for computer vision. In: CVPR. (2016) 2818–2826

work page 2016
[37]

IEEE TII (2019) accepted

Zhang, X., Cheny, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmen- tation for scene perception. IEEE TII (2019) accepted

work page 2019

[1] [1]

In: NIPS

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In: NIPS. (2012) 1097–1105

work page 2012

[2] [2]

In: CVPR

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016) 770–778

work page 2016

[3] [3]

In: CVPR

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu- rate object detection and semantic segmentation. In: CVPR. (2014) 580–587

work page 2014

[4] [4]

IEEE TPAMI 39 (2017) 640–651

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI 39 (2017) 640–651

work page 2017

[5] [5]

IEEE TPAMI 40 (2018) 834–848 ESNet for Real-time Semantic Segmentation 11

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI 40 (2018) 834–848 ESNet for Real-time Semantic Segmentation 11

work page 2018

[6] [6]

In: CVPR

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.Y.: Pyramid scene parsing network. In: CVPR. (2016) 6230–6239

work page 2016

[7] [7]

In: CVPR

Xiaoxiao, L., Zhiwei, L., Ping, L., Chenchange, L., Xiaoou, T.: Not all pixels are equal: Diﬃculty-aware semantic segmentation via deep layer cascade. In: CVPR. (2017) 6459–6468

work page 2017

[8] [8]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Badrinarayanan, V., Alex, K., Roberto, C.: Segnet: A deep convolutional encoder- decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

In: CVPR

Guosheng, L., Anton, M., Chunhua, S., Reid, I.: Reﬁnenet: multi-path reﬁnement networks for high-resolution semantic segmentation. In: CVPR. (2017) 5168–5177

work page 2017

[10] [10]

In: ICCV

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmen- tation. In: ICCV. (2015) 1520–1528

work page 2015

[11] [11]

In: CVPR

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR. (2015) 1–9

work page 2015

[12] [12]

In: CVPR

Peng, C., Xiangyu, Z., Gang, Y., Guiming, L., Jian, S.: Large kernel matters: Improve semantic segmentation by global convolutional network. In: CVPR. (2017) 1743–1751

work page 2017

[13] [13]

IEEE TPAMI 40 (2018) 1352–1366

Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI 40 (2018) 1352–1366

work page 2018

[14] [14]

In: ICASSP

Cong, D., Zhou, Q., Chen, J., Wu, X., Zhang, S., Ou, W., Lu, H.: Can: Contextual aggregating network for semantic segmentation. In: ICASSP. (2019) accepted

work page 2019

[15] [15]

CGNet: A Light-weight Context Guided Network for Semantic Segmentation

Wu, T.Y., Tang, S., Zhang, R., Zhang, Y.D.: Cgnet: A light-weight context guided network for semantic segmentation. In: arXiv preprint arXiv:1811.08201v1. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

In: NIPS Workshop

Treml, M., Arjona-Medina, J., Mayr, A., Heusel, M., Widrich, M., Bodenhofer, U., Nessler, B., Hochreiter, S.: Speeding up semantic segmentation for autonomous driving. In: NIPS Workshop. (2016) 1–7

work page 2016

[17] [17]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. In: arXiv preprint arXiv:1606.02147. (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: Espnet: Eﬃ- cient spatial pyramid of dilated convolutions for semantic segmentation. In: arXiv preprint arXiv:1803.06815v3. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

IEEE TITS 19 (2018) 263–272

Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation. IEEE TITS 19 (2018) 263–272

work page 2018

[20] [20]

In: CVPR

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR. (2016) 3213–3223

work page 2016

[21] [21]

IEEE TPAMI 35 (2013) 1915–1929

Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35 (2013) 1915–1929

work page 2013

[22] [22]

In: WACV

Panqu, W., Pengfei, C., Ye, Y., Ding, L., Zehua, H., Xiaodi, H., Cottrell, G.: Understanding convolution for semantic segmentation. In: WACV. (2018) 1451– 1460

work page 2018

[23] [23]

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh, C., George, P., F., S., H., A.: Rethinking atrous convolution for semantic image segmentation. In: arXiv:1706.05587. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Multi-Scale Context Aggregation by Dilated Convolutions

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[25] [25]

In: MICCAI

Ronneberger, O., Philipp, F., Thomas, B.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. (2015) 225–233 12 Y. Wang et al

work page 2015

[26] [26]

In: CVPR

Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR. (2017) 3309–3318

work page 2017

[27] [27]

In: CVPR

Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback reﬁnement network for dense image labeling. In: CVPR. (2017) 4877–4885

work page 2017

[28] [28]

IJCV 111 (2015) 98–136

Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. IJCV 111 (2015) 98–136

work page 2015

[29] [29]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., W.Wang, Weyand, T., An- dreetto, M., Adam, H.: Mobilenets: eﬃcient convolutional neural networks for mobile vision applications. In: arXiv preprint arXiv:1704.04861. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

In: ECCV

Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classiﬁ- cation using binary convolutional neural networks. In: ECCV. (2016)

work page 2016

[31] [31]

In: CVPR

Zhang, X., Zhou, X., Lin, M., Sun, J.: Shuﬄenet: An extremely eﬃcient convolu- tional neural network for mobile devices. In: CVPR. (2018) 6848–6856

work page 2018

[32] [32]

In: CVPR

Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR. (2016) 5168–5177

work page 2016

[33] [33]

In: CVPR

Xie, X., Girshick, R., Dollar, P., Tu, Z.W., He, K.M.: Aggregated residual trans- formations for deep neural networks. In: CVPR. (2017) 5987–5995

work page 2017

[34] [34]

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Changqian, Y., Jingbo, W., Chao, P., Changxin, G., Gang, Y., Nong, S.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: arXiv preprint arXiv:1808.00897. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Zhao, H.S., Qi, X.J., Shen, X.Y., Shi, J.P., Jia, J.Y.: Icnet for real-time semantic segmentation on high-resolution images. In: arXiv preprint arXiv:1704.08545v2. (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

In: CVPR

Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z.: Rethinking the incep- tion architecture for computer vision. In: CVPR. (2016) 2818–2826

work page 2016

[37] [37]

IEEE TII (2019) accepted

Zhang, X., Cheny, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmen- tation for scene perception. IEEE TII (2019) accepted

work page 2019