pith. machine review for the scientific record. sign in

arxiv: 1511.07122 · v3 · submitted 2015-11-23 · 💻 cs.CV

Recognition: unknown

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu , Vladlen Koltun

Authors on Pith no claims yet
classification 💻 cs.CV
keywords classificationconvolutionsdensedilatedimagemodulepredictionaccuracy
0
0 comments X
read the original abstract

State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image classification are structurally different. In this work, we develop a new convolutional network module that is specifically designed for dense prediction. The presented module uses dilated convolutions to systematically aggregate multi-scale contextual information without losing resolution. The architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage. We show that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems. In addition, we examine the adaptation of image classification networks to dense prediction and show that simplifying the adapted network can increase accuracy.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WaveNet: A Generative Model for Raw Audio

    cs.SD 2016-09 accept novelty 9.0

    WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

  2. Density estimation using Real NVP

    cs.LG 2016-05 accept novelty 8.0

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  3. WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning

    cs.CV 2026-05 unverdicted novelty 7.0

    WD-FQDet decouples modality-shared and modality-specific features in infrared-visible images via wavelet-based frequency decomposition and frequency-aware query selection to achieve state-of-the-art detection performance.

  4. Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

    cs.CV 2026-04 unverdicted novelty 7.0

    GSCNet with FDAM and SGCM modules plus the URTF benchmark improves fine-grained semantic segmentation on unaligned UAV RGBT images.

  5. KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition

    cs.CV 2026-04 unverdicted novelty 7.0

    KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.

  6. Cross-Stage Attention Propagation for Efficient Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 7.0

    CSAP computes attention at the deepest scale and propagates the maps to shallower stages, bypassing per-scale query-key computations to cut decoder FLOPs while preserving multi-scale performance and beating SegNeXt-Ti...

  7. Rethink the Role of Neural Decoders in Quantum Error Correction

    quant-ph 2026-05 unverdicted novelty 6.0

    Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.

  8. YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts

    cs.LG 2026-05 unverdicted novelty 6.0

    YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...

  9. AI-Empowered Low-Altitude Economy: Cooperative Sensing With Fixed Wireless Access

    eess.SP 2026-05 unverdicted novelty 5.0

    Cooperative AI sensing with FWA CPEs using CSI features, attention, and Transformer achieves 0.63% missed detection and 6.5m positioning error for UAVs.

  10. Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments

    physics.optics 2026-05 unverdicted novelty 5.0

    Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-spec...

  11. Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    DGM-Net reaches 82.3% mIoU on Cityscapes and 45.24% on ADE20K using directional geometric guidance inside a linear-complexity Mamba backbone, without heavy pretraining or large models.

  12. CNNs for Vis-NIR Chemometrics: From Contradiction to Conditional Design

    cs.LG 2026-05 unverdicted novelty 4.0

    Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.

  13. Cross-subject Muscle Fatigue Detection via Adversarial and Supervised Contrastive Learning with Inception-Attention Network

    cs.LG 2026-04 unverdicted novelty 4.0

    A model using Inception-attention, adversarial domain adaptation, and contrastive learning reaches 93.54% accuracy in three-class cross-subject muscle fatigue detection from sEMG signals.

  14. Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation

    cs.CV 2026-04 unverdicted novelty 3.0

    RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.