pith. machine review for the scientific record. sign in

arxiv: 1706.05587 · v3 · submitted 2017-06-17 · 💻 cs.CV

Recognition: no theorem link

Rethinking Atrous Convolution for Semantic Image Segmentation

Florian Schroff, George Papandreou, Hartwig Adam, Liang-Chieh Chen

Pith reviewed 2026-05-12 00:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords semantic image segmentationatrous convolutionmulti-scale contextDeepLabv3ASPPglobal contextPASCAL VOC 2012convolutional networks
0
0 comments X

The pith

Atrous convolutions in cascaded or parallel modules plus global context enable accurate semantic segmentation without DenseCRF.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper revisits atrous convolution to let deep networks explicitly control their field of view and feature resolution for semantic image segmentation. It introduces modules that apply atrous convolution at multiple rates either in sequence or side-by-side to gather context across object scales. The authors further enlarge their earlier Atrous Spatial Pyramid Pooling module by adding image-level features that supply global scene context. On the PASCAL VOC 2012 benchmark the resulting DeepLabv3 system raises accuracy over earlier DeepLab versions that still needed DenseCRF post-processing and reaches performance comparable to other leading models. Readers care because pixel-accurate labeling is essential for scene understanding yet the new design removes a separate, computationally heavy post-processing stage.

Core claim

By employing atrous convolution in cascaded or parallel arrangements with several rates and by augmenting the Atrous Spatial Pyramid Pooling module with image-level global-context features, the DeepLabv3 architecture captures multi-scale information directly inside the network, yielding significant accuracy gains over prior DeepLab models on PASCAL VOC 2012 without requiring DenseCRF post-processing and matching the performance of contemporary state-of-the-art segmentation systems.

What carries the argument

Atrous Spatial Pyramid Pooling module augmented with image-level global features, together with cascaded or parallel atrous convolution blocks that apply multiple dilation rates.

If this is right

  • Multi-scale context can be extracted inside a single forward pass by probing features at several atrous rates in parallel or cascade.
  • Global image-level features can be fused with local convolutional features to improve scene layout understanding.
  • DenseCRF post-processing is no longer required to reach high segmentation accuracy on PASCAL VOC 2012.
  • The same modular atrous design can be inserted into other DCNN backbones to adapt their receptive fields without changing network depth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The removal of the DenseCRF stage reduces both inference latency and memory use, which could support deployment on resource-limited devices.
  • Because the modules operate at different scales inside the network, similar rate-scheduling ideas may transfer to other dense-prediction tasks such as depth estimation or instance segmentation.
  • Sharing the exact training recipe allows direct ablation studies that isolate the effect of each atrous configuration on new datasets.

Load-bearing premise

The observed accuracy improvements are produced by the new atrous modules and global-context features rather than by any unstated changes in training schedule, data augmentation, or hyper-parameter choices.

What would settle it

Re-train the previous DeepLabv2 model with exactly the same training schedule, augmentation, and hyper-parameters used for DeepLabv3; if the mean intersection-over-union gap on the PASCAL VOC 2012 validation set closes or reverses, the contribution of the atrous modules is not established.

read the original abstract

In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper revisits atrous convolution for semantic image segmentation, proposing modules that apply atrous convolutions either in cascade or in parallel to capture multi-scale context, and augments the Atrous Spatial Pyramid Pooling (ASPP) module with image-level features to encode global context. It further details training practices and claims that the resulting DeepLabv3 system significantly improves over prior DeepLab versions (without DenseCRF) while attaining performance comparable to other state-of-the-art models on the PASCAL VOC 2012 benchmark.

Significance. If the reported gains are shown to stem from the architectural innovations rather than uncontrolled training variables, the work offers a practical and incremental advance in multi-scale context modeling for dense prediction tasks, building directly on prior ASPP designs with clear implementation guidance that could influence follow-on architectures.

major comments (2)
  1. [Experiments and abstract] Experiments section (and comparisons to prior DeepLab versions): the central claim that performance gains arise from the cascade/parallel atrous modules and image-level feature augmentation requires explicit confirmation that training schedules, data augmentation, and hyper-parameters are identical to those used in the authors' previous DeepLab papers; the abstract's reference to sharing 'implementation details and experience on training our system' does not substitute for matched baselines, leaving open the possibility that gains are confounded by optimization differences.
  2. [§3.2–3.3] §3.2–3.3 (atrous modules and ASPP augmentation): the manuscript should provide controlled ablations that isolate the incremental benefit of the new cascade/parallel atrous rates and the image-level feature branch while holding all training variables fixed, as the current presentation does not fully rule out that observed mIoU lifts are driven by the global context addition alone or by unstated hyper-parameter changes.
minor comments (1)
  1. [Abstract] Abstract: the claims of 'significant improvement' and 'comparable performance' would be more informative if accompanied by the specific mIoU numbers and table references that appear later in the manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will update the paper to provide the requested clarifications and additional controlled experiments.

read point-by-point responses
  1. Referee: [Experiments and abstract] Experiments section (and comparisons to prior DeepLab versions): the central claim that performance gains arise from the cascade/parallel atrous modules and image-level feature augmentation requires explicit confirmation that training schedules, data augmentation, and hyper-parameters are identical to those used in the authors' previous DeepLab papers; the abstract's reference to sharing 'implementation details and experience on training our system' does not substitute for matched baselines, leaving open the possibility that gains are confounded by optimization differences.

    Authors: We agree that explicit confirmation of matched training variables is necessary to attribute gains to the architectural changes. In the revised manuscript we will add a dedicated paragraph and table in the Experiments section that directly compares the training schedule (poly learning-rate policy, iteration count, crop size, batch size), data augmentation (random scaling, horizontal flipping, color jitter), and hyper-parameters used in DeepLabv3 to those reported in our prior DeepLabv2 work, noting only the architecture-specific modifications. This will make clear that the core optimization settings remain identical. revision: yes

  2. Referee: [§3.2–3.3] §3.2–3.3 (atrous modules and ASPP augmentation): the manuscript should provide controlled ablations that isolate the incremental benefit of the new cascade/parallel atrous rates and the image-level feature branch while holding all training variables fixed, as the current presentation does not fully rule out that observed mIoU lifts are driven by the global context addition alone or by unstated hyper-parameter changes.

    Authors: We appreciate the request for stricter isolation of each component. While the current manuscript already reports ablation results for different atrous rates in cascade and parallel settings as well as the addition of the image-level feature branch, we acknowledge that these experiments could be presented more explicitly as incremental, fixed-training-protocol studies. In the revision we will insert a new table that starts from a common baseline and successively adds the cascade module, the parallel module, and the image-level features, reporting mIoU on the PASCAL VOC 2012 validation set under identical training settings for each step. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark evaluation with independent results

full rationale

The paper proposes architectural changes (cascaded/parallel atrous convolutions and augmented ASPP) and reports empirical mIoU improvements on PASCAL VOC 2012. There is no derivation chain, no equations, and no 'predictions' that reduce by construction to fitted parameters or self-defined inputs. Benchmark numbers are externally falsifiable and not derived from the authors' prior fits. Self-references to earlier DeepLab versions are normal citations of prior empirical work and do not load-bear any mathematical reduction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard convolutional network assumptions plus the empirical hypothesis that multi-rate atrous modules plus global pooling capture scale variation better than prior designs. No new physical entities or mathematical axioms are introduced.

free parameters (2)
  • atrous rates
    Multiple dilation rates are chosen for the cascade and parallel modules; exact values are not stated in the abstract but are free parameters tuned for the task.
  • training hyper-parameters
    Learning rate schedule, batch size, and data augmentation choices are not detailed in the abstract yet directly affect the reported benchmark numbers.
axioms (1)
  • domain assumption Deep convolutional networks can be trained end-to-end on labeled segmentation data to produce per-pixel predictions.
    Invoked implicitly when the authors state that the proposed modules improve segmentation performance.

pith-pipeline@v0.9.0 · 5455 in / 1355 out tokens · 49147 ms · 2026-05-12T00:20:57.968537+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 30 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection

    cs.CV 2026-04 unverdicted novelty 7.0

    Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.

  2. VitaminP: cross-modal learning enables whole-cell segmentation from routine histology

    cs.CV 2026-04 unverdicted novelty 7.0

    VitaminP uses paired H&E-mIF data to train a model that transfers molecular boundary information, enabling accurate whole-cell segmentation directly from routine H&E histology across 34 cancer types.

  3. Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

    cs.CR 2026-04 unverdicted novelty 7.0

    Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

  4. Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

    cs.CV 2026-04 unverdicted novelty 7.0

    OVRSISBenchV2 is a realistic benchmark expanding scene and category coverage for open-vocabulary remote sensing segmentation, with Pi-Seg baseline showing strong transfer via positive-incentive noise perturbations.

  5. VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation

    cs.CV 2026-04 unverdicted novelty 7.0

    VGGT-Segmentor achieves new state-of-the-art cross-view segmentation on Ego-Exo4D with 67.7% and 68.0% average IoU using a geometry-enhanced model and correspondence-free pretraining that beats most supervised baselines.

  6. Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework

    cs.CV 2026-04 conditional novelty 7.0

    A hierarchical prompt tree with self-reflection graph propagation enables positive forward and backward knowledge transfer in incremental surgical instrument segmentation, improving over baselines by more than 5% and ...

  7. AOI-SSL: Self-Supervised Framework for Efficient Segmentation of Wire-bonded Semiconductors In Optical Inspection

    cs.CV 2026-05 unverdicted novelty 6.0

    AOI-SSL combines small-domain self-supervised pre-training of vision transformers with in-context patch retrieval to reduce labeled data needs and enable fast adaptation for semiconductor wire-bond segmentation.

  8. Nano-U: Efficient Terrain Segmentation for Tiny Robot Navigation

    cs.RO 2026-05 unverdicted novelty 6.0

    A compact network called Nano-U trained with quantization-aware distillation enables accurate binary terrain segmentation and runs efficiently on ESP32-S3 microcontrollers for tiny robots.

  9. UnGAP: Uncertainty-Guided Affine Prompting for Real-Time Crack Segmentation

    cs.CV 2026-05 unverdicted novelty 6.0

    UnGAP turns aleatoric uncertainty into an active calibration signal via affine feature modulation to fix gradient suppression in heteroscedastic crack segmentation while maintaining real-time performance.

  10. Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy

    cs.CV 2026-05 unverdicted novelty 6.0

    RGSUD achieves SOTA unsupervised deraining by using IQA-based reward recycling and self-reinforcement to constrain optimization and improve pseudo-paired data quality.

  11. DOT-Sim: Differentiable Optical Tactile Simulation with Precise Real-to-Sim Physical Calibration

    cs.RO 2026-04 unverdicted novelty 6.0

    DOT-Sim uses MPM physics plus learned residual optics to simulate deformable tactile sensors, supporting zero-shot sim-to-real transfer for classification and control tasks.

  12. Diffusion Model as a Generalist Segmentation Learner

    cs.CV 2026-04 unverdicted novelty 6.0

    DiGSeg repurposes diffusion U-Nets as generalist segmentation learners by conditioning on image-mask latents and multi-scale CLIP text features, achieving strong cross-domain performance.

  13. FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

    cs.CV 2026-04 unverdicted novelty 6.0

    FryNet combines RGB and thermal imaging with adversarial regularization to segment oil areas, classify usability, and predict oxidation levels like PV and Totox with high accuracy on video data.

  14. Lorentz Framework for Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    A Lorentz-model hyperbolic framework for semantic segmentation that integrates with Euclidean networks, provides free uncertainty maps, and is validated on ADE20K, COCO-Stuff, Pascal-VOC and Cityscapes using DeepLabV3...

  15. From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.

  16. AIBuildAI: An AI Agent for Automatically Building AI Models

    cs.AI 2026-04 unverdicted novelty 6.0

    AIBuildAI uses a manager agent and three LLM sub-agents to fully automate AI model development and achieves a 63.1% medal rate on MLE-Bench, matching experienced human engineers.

  17. GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality

    cs.CV 2026-04 unverdicted novelty 6.0

    GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-on...

  18. Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization

    cs.CV 2026-04 unverdicted novelty 6.0

    A large pool of diverse artistic styles for style-transfer augmentation improves domain generalization in driving vision models more than repeated use of few styles or domain-matched styles, yielding the lightweight S...

  19. CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    CrossWeaver introduces MIB and SAF modules to enable flexible, reliability-aware cross-modal interaction and fusion, achieving SOTA multimodal semantic segmentation with minimal parameters and generalization to unseen...

  20. Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    DGM-Net reaches 82.3% mIoU on Cityscapes and 45.24% on ADE20K using directional geometric guidance inside a linear-complexity Mamba backbone, without heavy pretraining or large models.

  21. AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos

    cs.CV 2026-04 unverdicted novelty 5.0

    AutoAWG generates controllable adverse weather automotive videos via semantics-guided adaptive multi-control fusion and vanishing-point-anchored temporal synthesis from static images, reducing FID by 50% and FVD by 16...

  22. DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    DeltaSeg, a tiered-attention U-Net variant with a novel Deep Delta Attention module, outperforms 12 prior models on two multi-class structural defect segmentation benchmarks.

  23. WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR Interferograms

    cs.CV 2026-04 unverdicted novelty 5.0

    WILD-SAM is a fine-tuned SAM variant using phase-aware MoE adapters and wavelet subband enhancement that achieves state-of-the-art landslide detection on wrapped InSAR data.

  24. HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    HQF-Net reports mIoU gains on three remote-sensing benchmarks by adding quantum circuits to skip connections and a mixture-of-experts bottleneck inside a classical U-Net fused with a DINOv3 backbone.

  25. FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation

    cs.CV 2026-05 unverdicted novelty 4.0

    FoR-Net improves efficiency in semantic segmentation by focusing on hard regions with a learned selector and multi-scale convolutions, achieving competitive results on Cityscapes.

  26. An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving

    cs.CV 2026-04 unverdicted novelty 4.0

    A decision-aware multi-scale attention network generates tailored explanations for autonomous driving choices and outperforms prior models on F1 and a new Joint F1 metric across two datasets.

  27. A Benchmark Study of Segmentation Models and Adaptation Strategies for Landslide Detection from Satellite Imagery

    cs.CV 2026-04 unverdicted novelty 4.0

    Transformer-based models deliver strong landslide segmentation on satellite images, and parameter-efficient fine-tuning matches full fine-tuning accuracy while cutting trainable parameters by up to 95%.

  28. UA-Net: Uncertainty-Aware Network for TRISO Image Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 4.0

    UA-Net segments TRISO fuel micrographs into five regions with 95.5% mIoU and 97.3% mP on 102 test images, while its meta-model detects misclassifications at 91.8% specificity and 93.5% sensitivity.

  29. EDFNet: Early Fusion of Edge and Depth for Thin-Obstacle Segmentation in UAV Navigation

    cs.CV 2026-04 unverdicted novelty 4.0

    Early RGB-Depth-Edge fusion in EDFNet provides a competitive baseline for thin-obstacle segmentation on the DDOS dataset, with the best pretrained U-Net model reaching 0.244 Thin-Structure Evaluation Score.

  30. ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine

    cs.CV 2026-04 unverdicted novelty 3.0

    ResAF-Net detects trees in satellite imagery for Palestinian agriculture, reaching 82% recall and 63% mAP@0.5 on the MillionTrees validation set and deployed in a web GIS.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages · cited by 30 Pith papers · 3 internal anchors

  1. [1]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    M. Abadi, A. Agarwal, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016

  2. [2]

    Adams, J

    A. Adams, J. Baek, and M. A. Davis. Fast high-dimensional filtering using the permutohedral lattice. In Eurographics, 2010. Method Coarse mIOU DeepLabv2-CRF [11] 70.4 Deep Layer Cascade [52] 71.1 ML-CRNN [21] 71.2 Adelaide context [55] 71.6 FRRN [70] 71.8 LRR-4x [25] ✓ 71.8 RefineNet [54] 73.6 FoveaNet [51] 74.1 Ladder DenseNet [46] 74.3 PEARL [42] 75.4 Glo...

  3. [3]

    Badrinarayanan, A

    V . Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561, 2015

  4. [4]

    A. Brandt. Multi-level adaptive solutions to boundary-value problems. Mathematics of computation, 31(138):333–390, 1977

  5. [5]

    W. L. Briggs, V . E. Henson, and S. F. McCormick.A multigrid tutorial. SIAM, 2000

  6. [6]

    Byeon, T

    W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. Scene labeling with lstm recurrent neural networks. In CVPR, 2015

  7. [7]

    Caesar, J

    H. Caesar, J. Uijlings, and V . Ferrari. COCO-Stuff: Thing and stuff classes in context. arXiv:1612.03716, 2016

  8. [8]

    Chandra and I

    S. Chandra and I. Kokkinos. Fast, exact and multi-scale in- ference for semantic image segmentation with deep Gaussian CRFs. arXiv:1603.08358, 2016

  9. [9]

    L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L. Yuille. Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In CVPR, 2016

  10. [10]

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2015

  11. [11]

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915, 2016

  12. [12]

    L.-C. Chen, Y . Yang, J. Wang, W. Xu, and A. L. Yuille. At- tention to scale: Scale-aware semantic image segmentation. In CVPR, 2016

  13. [13]

    F. Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv:1610.02357, 2016. Figure 8. Visualization results on Cityscapes val set when training with only train fine set

  14. [14]

    Cordts, M

    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016

  15. [15]

    J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stuff segmentation. arXiv:1412.1283, 2014

  16. [16]

    J. Dai, K. He, and J. Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmenta- tion. In ICCV, 2015

  17. [17]

    J. Dai, Y . Li, K. He, and J. Sun. R-fcn: Object detection via region-based fully convolutional networks. arXiv:1605.06409, 2016

  18. [18]

    J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei. Deformable convolutional networks. arXiv:1703.06211, 2017

  19. [19]

    Eigen and R

    D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. arXiv:1411.4734, 2014

  20. [20]

    Everingham, S

    M. Everingham, S. M. A. Eslami, L. V . Gool, C. K. I. Williams, J. Winn, and A. Zisserma. The pascal visual object classes challenge a retrospective. IJCV, 2014

  21. [21]

    H. Fan, X. Mei, D. Prokhorov, and H. Ling. Multi-level contextual rnns with attention model for scene labeling. arXiv:1607.02537, 2016

  22. [22]

    Farabet, C

    C. Farabet, C. Couprie, L. Najman, and Y . LeCun. Learning hierarchical features for scene labeling. PAMI, 2013

  23. [23]

    J. Fu, J. Liu, Y . Wang, and H. Lu. Stacked deconvolutional network for semantic segmentation. arXiv:1708.04943, 2017

  24. [24]

    Gadde, V

    R. Gadde, V . Jampani, and P. V . Gehler. Semantic video cnns through representation warping. In ICCV, 2017

  25. [25]

    Ghiasi and C

    G. Ghiasi and C. C. Fowlkes. Laplacian reconstruction and refinement for semantic segmentation. arXiv:1605.02264, 2016

  26. [26]

    Giusti, D

    A. Giusti, D. Ciresan, J. Masci, L. Gambardella, and J. Schmidhuber. Fast image scanning with deep max-pooling convolutional neural networks. In ICIP, 2013

  27. [27]

    Gould, R

    S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In ICCV. IEEE, 2009

  28. [28]

    Grauman and T

    K. Grauman and T. Darrell. The pyramid match kernel: Dis- criminative classification with sets of image features. InICCV, 2005

  29. [29]

    Hariharan, P

    B. Hariharan, P. Arbel´aez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In ICCV, 2011

  30. [30]

    Hariharan, P

    B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik. Hyper- columns for object segmentation and fine-grained localization. In CVPR, 2015

  31. [31]

    K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014

  32. [32]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv:1512.03385, 2015

  33. [33]

    X. He, R. S. Zemel, and M. Carreira-Perpindn. Multiscale conditional random fields for image labeling. In CVPR, 2004

  34. [34]

    Hinton, O

    G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS, 2014

  35. [35]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

  36. [36]

    Holschneider, R

    M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets: Time- Frequency Methods and Phase Space, pages 289–297. 1989

  37. [37]

    Huang, V

    J. Huang, V . Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y . Song, S. Guadarrama, and K. Murphy. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR, 2017

  38. [38]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015

  39. [39]

    M. A. Islam, M. Rochan, N. D. Bruce, and Y . Wang. Gated feedback refinement network for dense image labeling. In CVPR, 2017

  40. [40]

    S. D. Jain, B. Xiong, and K. Grauman. Fusionseg: Learn- ing to combine motion and appearance for fully automatic segmention of generic objects in videos. In CVPR, 2017

  41. [41]

    Jampani, M

    V . Jampani, M. Kiefel, and P. V . Gehler. Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. In CVPR, 2016

  42. [42]

    X. Jin, X. Li, H. Xiao, X. Shen, Z. Lin, J. Yang, Y . Chen, J. Dong, L. Liu, Z. Jie, J. Feng, and S. Yan. Video scene parsing with predictive feature learning. In ICCV, 2017

  43. [43]

    Kohli, P

    P. Kohli, P. H. Torr, et al. Robust higher order potentials for enforcing label consistency. IJCV, 82(3):302–324, 2009

  44. [44]

    Kong and C

    S. Kong and C. Fowlkes. Recurrent scene parsing with per- spective understanding in the loop. arXiv:1705.07238, 2017

  45. [45]

    Kr¨ahenb¨uhl and V

    P. Kr¨ahenb¨uhl and V . Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011

  46. [46]

    Kreˇso, S

    I. Kreˇso, S. ˇSegvi´c, and J. Krapac. Ladder-style densenets for semantic segmentation of large natural images. In ICCV CVRSUAD workshop, 2017

  47. [47]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012

  48. [48]

    Ladicky, C

    L. Ladicky, C. Russell, P. Kohli, and P. H. Torr. Associative hierarchical crfs for object class image segmentation. In ICCV, 2009

  49. [49]

    Lazebnik, C

    S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of fea- tures: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006

  50. [50]

    LeCun, B

    Y . LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computa- tion, 1(4):541–551, 1989

  51. [51]

    X. Li, Z. Jie, W. Wang, C. Liu, J. Yang, X. Shen, Z. Lin, Q. Chen, S. Yan, and J. Feng. Foveanet: Perspective-aware urban scene parsing. arXiv:1708.02421, 2017

  52. [52]

    X. Li, Z. Liu, P. Luo, C. C. Loy, and X. Tang. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. arXiv:1704.01344, 2017

  53. [53]

    Liang, X

    X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. Yan. Semantic object parsing with local-global long short-term memory. arXiv:1511.04510, 2015

  54. [54]

    G. Lin, A. Milan, C. Shen, and I. Reid. Refinenet: Multi- path refinement networks with identity mappings for high- resolution semantic segmentation. arXiv:1611.06612, 2016

  55. [55]

    G. Lin, C. Shen, I. Reid, et al. Efficient piecewise train- ing of deep structured models for semantic segmentation. arXiv:1504.01013, 2015

  56. [56]

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. arXiv:1612.03144, 2016

  57. [57]

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick. Microsoft COCO: Com- mon objects in context. In ECCV, 2014

  58. [58]

    W. Liu, A. Rabinovich, and A. C. Berg. Parsenet: Looking wider to see better. arXiv:1506.04579, 2015

  59. [59]

    Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In ICCV, 2015

  60. [60]

    J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015

  61. [61]

    P. Luo, G. Wang, L. Lin, and X. Wang. Deep dual learning for semantic image segmentation. In ICCV, 2017

  62. [62]

    Mostajabi, P

    M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich. Feed- forward semantic segmentation with zoom-out features. In CVPR, 2015

  63. [63]

    Mottaghi, X

    R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. The role of context for object detection and semantic segmentation in the wild. In CVPR, 2014

  64. [64]

    H. Noh, S. Hong, and B. Han. Learning deconvolution net- work for semantic segmentation. In ICCV, 2015

  65. [65]

    Papandreou, L.-C

    G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. Weakly- and semi-supervised learning of a dcnn for semantic image segmentation. In ICCV, 2015

  66. [66]

    Papandreou, I

    G. Papandreou, I. Kokkinos, and P.-A. Savalle. Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection. In CVPR, 2015

  67. [67]

    Papandreou and P

    G. Papandreou and P. Maragos. Multigrid geometric active contour models. TIP, 16(1):229–240, 2007

  68. [68]

    C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun. Large kernel matters–improve semantic segmentation by global convolu- tional network. arXiv:1703.02719, 2017

  69. [69]

    Pinheiro and R

    P. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, 2014

  70. [70]

    Pohlen, A

    T. Pohlen, A. Hermans, M. Mathias, and B. Leibe. Full- resolution residual networks for semantic segmentation in street scenes. arXiv:1611.08323, 2016

  71. [71]

    Ronneberger, P

    O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015

  72. [72]

    Russakovsky, J

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015

  73. [73]

    A. G. Schwing and R. Urtasun. Fully connected deep struc- tured networks. arXiv:1503.02351, 2015

  74. [74]

    Sermanet, D

    P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013

  75. [75]

    F. Shen, R. Gan, S. Yan, and G. Zeng. Semantic segmentation via structured patch prediction, context crf and guidance crf. In CVPR, 2017

  76. [76]

    Shotton, J

    J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 2009

  77. [77]

    Shrivastava, R

    A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta. Be- yond skip connections: Top-down modulation for object de- tection. arXiv:1612.06851, 2016

  78. [78]

    Simonyan and A

    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015

  79. [79]

    C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, 2017

  80. [80]

    H. Sun, D. Xie, and S. Pu. Mixed context networks for semantic segmentation. arXiv:1610.05854, 2016

Showing first 80 references.