Recognition: no theorem link
Rethinking Atrous Convolution for Semantic Image Segmentation
Pith reviewed 2026-05-12 00:20 UTC · model grok-4.3
The pith
Atrous convolutions in cascaded or parallel modules plus global context enable accurate semantic segmentation without DenseCRF.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By employing atrous convolution in cascaded or parallel arrangements with several rates and by augmenting the Atrous Spatial Pyramid Pooling module with image-level global-context features, the DeepLabv3 architecture captures multi-scale information directly inside the network, yielding significant accuracy gains over prior DeepLab models on PASCAL VOC 2012 without requiring DenseCRF post-processing and matching the performance of contemporary state-of-the-art segmentation systems.
What carries the argument
Atrous Spatial Pyramid Pooling module augmented with image-level global features, together with cascaded or parallel atrous convolution blocks that apply multiple dilation rates.
If this is right
- Multi-scale context can be extracted inside a single forward pass by probing features at several atrous rates in parallel or cascade.
- Global image-level features can be fused with local convolutional features to improve scene layout understanding.
- DenseCRF post-processing is no longer required to reach high segmentation accuracy on PASCAL VOC 2012.
- The same modular atrous design can be inserted into other DCNN backbones to adapt their receptive fields without changing network depth.
Where Pith is reading between the lines
- The removal of the DenseCRF stage reduces both inference latency and memory use, which could support deployment on resource-limited devices.
- Because the modules operate at different scales inside the network, similar rate-scheduling ideas may transfer to other dense-prediction tasks such as depth estimation or instance segmentation.
- Sharing the exact training recipe allows direct ablation studies that isolate the effect of each atrous configuration on new datasets.
Load-bearing premise
The observed accuracy improvements are produced by the new atrous modules and global-context features rather than by any unstated changes in training schedule, data augmentation, or hyper-parameter choices.
What would settle it
Re-train the previous DeepLabv2 model with exactly the same training schedule, augmentation, and hyper-parameters used for DeepLabv3; if the mean intersection-over-union gap on the PASCAL VOC 2012 validation set closes or reverses, the contribution of the atrous modules is not established.
read the original abstract
In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper revisits atrous convolution for semantic image segmentation, proposing modules that apply atrous convolutions either in cascade or in parallel to capture multi-scale context, and augments the Atrous Spatial Pyramid Pooling (ASPP) module with image-level features to encode global context. It further details training practices and claims that the resulting DeepLabv3 system significantly improves over prior DeepLab versions (without DenseCRF) while attaining performance comparable to other state-of-the-art models on the PASCAL VOC 2012 benchmark.
Significance. If the reported gains are shown to stem from the architectural innovations rather than uncontrolled training variables, the work offers a practical and incremental advance in multi-scale context modeling for dense prediction tasks, building directly on prior ASPP designs with clear implementation guidance that could influence follow-on architectures.
major comments (2)
- [Experiments and abstract] Experiments section (and comparisons to prior DeepLab versions): the central claim that performance gains arise from the cascade/parallel atrous modules and image-level feature augmentation requires explicit confirmation that training schedules, data augmentation, and hyper-parameters are identical to those used in the authors' previous DeepLab papers; the abstract's reference to sharing 'implementation details and experience on training our system' does not substitute for matched baselines, leaving open the possibility that gains are confounded by optimization differences.
- [§3.2–3.3] §3.2–3.3 (atrous modules and ASPP augmentation): the manuscript should provide controlled ablations that isolate the incremental benefit of the new cascade/parallel atrous rates and the image-level feature branch while holding all training variables fixed, as the current presentation does not fully rule out that observed mIoU lifts are driven by the global context addition alone or by unstated hyper-parameter changes.
minor comments (1)
- [Abstract] Abstract: the claims of 'significant improvement' and 'comparable performance' would be more informative if accompanied by the specific mIoU numbers and table references that appear later in the manuscript.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will update the paper to provide the requested clarifications and additional controlled experiments.
read point-by-point responses
-
Referee: [Experiments and abstract] Experiments section (and comparisons to prior DeepLab versions): the central claim that performance gains arise from the cascade/parallel atrous modules and image-level feature augmentation requires explicit confirmation that training schedules, data augmentation, and hyper-parameters are identical to those used in the authors' previous DeepLab papers; the abstract's reference to sharing 'implementation details and experience on training our system' does not substitute for matched baselines, leaving open the possibility that gains are confounded by optimization differences.
Authors: We agree that explicit confirmation of matched training variables is necessary to attribute gains to the architectural changes. In the revised manuscript we will add a dedicated paragraph and table in the Experiments section that directly compares the training schedule (poly learning-rate policy, iteration count, crop size, batch size), data augmentation (random scaling, horizontal flipping, color jitter), and hyper-parameters used in DeepLabv3 to those reported in our prior DeepLabv2 work, noting only the architecture-specific modifications. This will make clear that the core optimization settings remain identical. revision: yes
-
Referee: [§3.2–3.3] §3.2–3.3 (atrous modules and ASPP augmentation): the manuscript should provide controlled ablations that isolate the incremental benefit of the new cascade/parallel atrous rates and the image-level feature branch while holding all training variables fixed, as the current presentation does not fully rule out that observed mIoU lifts are driven by the global context addition alone or by unstated hyper-parameter changes.
Authors: We appreciate the request for stricter isolation of each component. While the current manuscript already reports ablation results for different atrous rates in cascade and parallel settings as well as the addition of the image-level feature branch, we acknowledge that these experiments could be presented more explicitly as incremental, fixed-training-protocol studies. In the revision we will insert a new table that starts from a common baseline and successively adds the cascade module, the parallel module, and the image-level features, reporting mIoU on the PASCAL VOC 2012 validation set under identical training settings for each step. revision: yes
Circularity Check
No circularity: empirical benchmark evaluation with independent results
full rationale
The paper proposes architectural changes (cascaded/parallel atrous convolutions and augmented ASPP) and reports empirical mIoU improvements on PASCAL VOC 2012. There is no derivation chain, no equations, and no 'predictions' that reduce by construction to fitted parameters or self-defined inputs. Benchmark numbers are externally falsifiable and not derived from the authors' prior fits. Self-references to earlier DeepLab versions are normal citations of prior empirical work and do not load-bear any mathematical reduction.
Axiom & Free-Parameter Ledger
free parameters (2)
- atrous rates
- training hyper-parameters
axioms (1)
- domain assumption Deep convolutional networks can be trained end-to-end on labeled segmentation data to produce per-pixel predictions.
Forward citations
Cited by 30 Pith papers
-
Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection
Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.
-
VitaminP: cross-modal learning enables whole-cell segmentation from routine histology
VitaminP uses paired H&E-mIF data to train a model that transfers molecular boundary information, enabling accurate whole-cell segmentation directly from routine H&E histology across 34 cancer types.
-
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
-
Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline
OVRSISBenchV2 is a realistic benchmark expanding scene and category coverage for open-vocabulary remote sensing segmentation, with Pi-Seg baseline showing strong transfer via positive-incentive noise perturbations.
-
VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation
VGGT-Segmentor achieves new state-of-the-art cross-view segmentation on Ego-Exo4D with 67.7% and 68.0% average IoU using a geometry-enhanced model and correspondence-free pretraining that beats most supervised baselines.
-
Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework
A hierarchical prompt tree with self-reflection graph propagation enables positive forward and backward knowledge transfer in incremental surgical instrument segmentation, improving over baselines by more than 5% and ...
-
AOI-SSL: Self-Supervised Framework for Efficient Segmentation of Wire-bonded Semiconductors In Optical Inspection
AOI-SSL combines small-domain self-supervised pre-training of vision transformers with in-context patch retrieval to reduce labeled data needs and enable fast adaptation for semiconductor wire-bond segmentation.
-
Nano-U: Efficient Terrain Segmentation for Tiny Robot Navigation
A compact network called Nano-U trained with quantization-aware distillation enables accurate binary terrain segmentation and runs efficiently on ESP32-S3 microcontrollers for tiny robots.
-
UnGAP: Uncertainty-Guided Affine Prompting for Real-Time Crack Segmentation
UnGAP turns aleatoric uncertainty into an active calibration signal via affine feature modulation to fix gradient suppression in heteroscedastic crack segmentation while maintaining real-time performance.
-
Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy
RGSUD achieves SOTA unsupervised deraining by using IQA-based reward recycling and self-reinforcement to constrain optimization and improve pseudo-paired data quality.
-
DOT-Sim: Differentiable Optical Tactile Simulation with Precise Real-to-Sim Physical Calibration
DOT-Sim uses MPM physics plus learned residual optics to simulate deformable tactile sensors, supporting zero-shot sim-to-real transfer for classification and control tasks.
-
Diffusion Model as a Generalist Segmentation Learner
DiGSeg repurposes diffusion U-Nets as generalist segmentation learners by conditioning on image-mask latents and multi-scale CLIP text features, achieving strong cross-domain performance.
-
FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment
FryNet combines RGB and thermal imaging with adversarial regularization to segment oil areas, classify usability, and predict oxidation levels like PV and Totox with high accuracy on video data.
-
Lorentz Framework for Semantic Segmentation
A Lorentz-model hyperbolic framework for semantic segmentation that integrates with Euclidean networks, provides free uncertainty maps, and is validated on ADE20K, COCO-Stuff, Pascal-VOC and Cityscapes using DeepLabV3...
-
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
-
AIBuildAI: An AI Agent for Automatically Building AI Models
AIBuildAI uses a manager agent and three LLM sub-agents to fully automate AI model development and achieves a 63.1% medal rate on MLE-Bench, matching experienced human engineers.
-
GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality
GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-on...
-
Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization
A large pool of diverse artistic styles for style-transfer augmentation improves domain generalization in driving vision models more than repeated use of few styles or domain-matched styles, yielding the lightweight S...
-
CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation
CrossWeaver introduces MIB and SAF modules to enable flexible, reliability-aware cross-modal interaction and fusion, achieving SOTA multimodal semantic segmentation with minimal parameters and generalization to unseen...
-
Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation
DGM-Net reaches 82.3% mIoU on Cityscapes and 45.24% on ADE20K using directional geometric guidance inside a linear-complexity Mamba backbone, without heavy pretraining or large models.
-
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
AutoAWG generates controllable adverse weather automotive videos via semantics-guided adaptive multi-control fusion and vanishing-point-anchored temporal synthesis from static images, reducing FID by 50% and FVD by 16...
-
DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation
DeltaSeg, a tiered-attention U-Net variant with a novel Deep Delta Attention module, outperforms 12 prior models on two multi-class structural defect segmentation benchmarks.
-
WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR Interferograms
WILD-SAM is a fine-tuned SAM variant using phase-aware MoE adapters and wavelet subband enhancement that achieves state-of-the-art landslide detection on wrapped InSAR data.
-
HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation
HQF-Net reports mIoU gains on three remote-sensing benchmarks by adding quantum circuits to skip connections and a mixture-of-experts bottleneck inside a classical U-Net fused with a DINOv3 backbone.
-
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
FoR-Net improves efficiency in semantic segmentation by focusing on hard regions with a learned selector and multi-scale convolutions, achieving competitive results on Cityscapes.
-
An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving
A decision-aware multi-scale attention network generates tailored explanations for autonomous driving choices and outperforms prior models on F1 and a new Joint F1 metric across two datasets.
-
A Benchmark Study of Segmentation Models and Adaptation Strategies for Landslide Detection from Satellite Imagery
Transformer-based models deliver strong landslide segmentation on satellite images, and parameter-efficient fine-tuning matches full fine-tuning accuracy while cutting trainable parameters by up to 95%.
-
UA-Net: Uncertainty-Aware Network for TRISO Image Semantic Segmentation
UA-Net segments TRISO fuel micrographs into five regions with 95.5% mIoU and 97.3% mP on 102 test images, while its meta-model detects misclassifications at 91.8% specificity and 93.5% sensitivity.
-
EDFNet: Early Fusion of Edge and Depth for Thin-Obstacle Segmentation in UAV Navigation
Early RGB-Depth-Edge fusion in EDFNet provides a competitive baseline for thin-obstacle segmentation on the DDOS dataset, with the best pretrained U-Net model reaching 0.244 Thin-Structure Evaluation Score.
-
ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine
ResAF-Net detects trees in satellite imagery for Palestinian agriculture, reaching 82% recall and 63% mAP@0.5 on the MillionTrees validation set and deployed in a web GIS.
Reference graph
Works this paper leans on
-
[1]
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
M. Abadi, A. Agarwal, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016
work page Pith review arXiv 2016
-
[2]
A. Adams, J. Baek, and M. A. Davis. Fast high-dimensional filtering using the permutohedral lattice. In Eurographics, 2010. Method Coarse mIOU DeepLabv2-CRF [11] 70.4 Deep Layer Cascade [52] 71.1 ML-CRNN [21] 71.2 Adelaide context [55] 71.6 FRRN [70] 71.8 LRR-4x [25] ✓ 71.8 RefineNet [54] 73.6 FoveaNet [51] 74.1 Ladder DenseNet [46] 74.3 PEARL [42] 75.4 Glo...
work page 2010
-
[3]
V . Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561, 2015
-
[4]
A. Brandt. Multi-level adaptive solutions to boundary-value problems. Mathematics of computation, 31(138):333–390, 1977
work page 1977
-
[5]
W. L. Briggs, V . E. Henson, and S. F. McCormick.A multigrid tutorial. SIAM, 2000
work page 2000
- [6]
- [7]
-
[8]
S. Chandra and I. Kokkinos. Fast, exact and multi-scale in- ference for semantic image segmentation with deep Gaussian CRFs. arXiv:1603.08358, 2016
-
[9]
L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L. Yuille. Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In CVPR, 2016
work page 2016
-
[10]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2015
work page 2015
- [11]
-
[12]
L.-C. Chen, Y . Yang, J. Wang, W. Xu, and A. L. Yuille. At- tention to scale: Scale-aware semantic image segmentation. In CVPR, 2016
work page 2016
- [13]
- [14]
- [15]
-
[16]
J. Dai, K. He, and J. Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmenta- tion. In ICCV, 2015
work page 2015
- [17]
- [18]
-
[19]
D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. arXiv:1411.4734, 2014
-
[20]
M. Everingham, S. M. A. Eslami, L. V . Gool, C. K. I. Williams, J. Winn, and A. Zisserma. The pascal visual object classes challenge a retrospective. IJCV, 2014
work page 2014
- [21]
-
[22]
C. Farabet, C. Couprie, L. Najman, and Y . LeCun. Learning hierarchical features for scene labeling. PAMI, 2013
work page 2013
- [23]
- [24]
-
[25]
G. Ghiasi and C. C. Fowlkes. Laplacian reconstruction and refinement for semantic segmentation. arXiv:1605.02264, 2016
- [26]
- [27]
-
[28]
K. Grauman and T. Darrell. The pyramid match kernel: Dis- criminative classification with sets of image features. InICCV, 2005
work page 2005
-
[29]
B. Hariharan, P. Arbel´aez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In ICCV, 2011
work page 2011
-
[30]
B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik. Hyper- columns for object segmentation and fine-grained localization. In CVPR, 2015
work page 2015
-
[31]
K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014
work page 2014
-
[32]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv:1512.03385, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[33]
X. He, R. S. Zemel, and M. Carreira-Perpindn. Multiscale conditional random fields for image labeling. In CVPR, 2004
work page 2004
- [34]
-
[35]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997
work page 1997
-
[36]
M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets: Time- Frequency Methods and Phase Space, pages 289–297. 1989
work page 1989
- [37]
-
[38]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015
work page internal anchor Pith review arXiv 2015
-
[39]
M. A. Islam, M. Rochan, N. D. Bruce, and Y . Wang. Gated feedback refinement network for dense image labeling. In CVPR, 2017
work page 2017
-
[40]
S. D. Jain, B. Xiong, and K. Grauman. Fusionseg: Learn- ing to combine motion and appearance for fully automatic segmention of generic objects in videos. In CVPR, 2017
work page 2017
-
[41]
V . Jampani, M. Kiefel, and P. V . Gehler. Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. In CVPR, 2016
work page 2016
-
[42]
X. Jin, X. Li, H. Xiao, X. Shen, Z. Lin, J. Yang, Y . Chen, J. Dong, L. Liu, Z. Jie, J. Feng, and S. Yan. Video scene parsing with predictive feature learning. In ICCV, 2017
work page 2017
- [43]
-
[44]
S. Kong and C. Fowlkes. Recurrent scene parsing with per- spective understanding in the loop. arXiv:1705.07238, 2017
-
[45]
P. Kr¨ahenb¨uhl and V . Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011
work page 2011
- [46]
-
[47]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012
work page 2012
-
[48]
L. Ladicky, C. Russell, P. Kohli, and P. H. Torr. Associative hierarchical crfs for object class image segmentation. In ICCV, 2009
work page 2009
-
[49]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of fea- tures: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006
work page 2006
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
-
[57]
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick. Microsoft COCO: Com- mon objects in context. In ECCV, 2014
work page 2014
- [58]
-
[59]
Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In ICCV, 2015
work page 2015
-
[60]
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015
work page 2015
-
[61]
P. Luo, G. Wang, L. Lin, and X. Wang. Deep dual learning for semantic image segmentation. In ICCV, 2017
work page 2017
-
[62]
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich. Feed- forward semantic segmentation with zoom-out features. In CVPR, 2015
work page 2015
-
[63]
R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. The role of context for object detection and semantic segmentation in the wild. In CVPR, 2014
work page 2014
-
[64]
H. Noh, S. Hong, and B. Han. Learning deconvolution net- work for semantic segmentation. In ICCV, 2015
work page 2015
-
[65]
G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille. Weakly- and semi-supervised learning of a dcnn for semantic image segmentation. In ICCV, 2015
work page 2015
-
[66]
G. Papandreou, I. Kokkinos, and P.-A. Savalle. Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection. In CVPR, 2015
work page 2015
-
[67]
G. Papandreou and P. Maragos. Multigrid geometric active contour models. TIP, 16(1):229–240, 2007
work page 2007
- [68]
-
[69]
P. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, 2014
work page 2014
- [70]
-
[71]
O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015
work page 2015
-
[72]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015
work page 2015
- [73]
-
[74]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013
-
[75]
F. Shen, R. Gan, S. Yan, and G. Zeng. Semantic segmentation via structured patch prediction, context crf and guidance crf. In CVPR, 2017
work page 2017
-
[76]
J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 2009
work page 2009
-
[77]
A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta. Be- yond skip connections: Top-down modulation for object de- tection. arXiv:1612.06851, 2016
-
[78]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015
work page 2015
-
[79]
C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, 2017
work page 2017
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.