pith. machine review for the scientific record. sign in

arxiv: 2605.01916 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: unknown

EAPFusion: Intrinsic Evolving Auxiliary Prior Guidance for Infrared and Visible Image Fusion

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords infrared visible image fusiondynamic convolutionintrinsic priorsadaptive kernelsmulti-modal fusionchannel mixingsemantic segmentation
0
0 comments X

The pith

Self-evolving intrinsic priors generate adaptive kernels for state-of-the-art infrared-visible image fusion without external models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that infrared-visible fusion improves when static trained weights are replaced by a compact set of intrinsic priors that evolve across scales and condition dynamic convolutions to produce scene-specific kernels. This addresses the problem of methods failing to adapt to particular content at inference time while avoiding granularity issues from coarse external semantics. The approach also includes a channel-shuffling module to mix modalities locally. Experiments across datasets, including cross-dataset tests and downstream semantic segmentation, demonstrate superior fusion quality and consistent task gains. A sympathetic reader would care because better adaptive fusion directly aids perception in low-light or complex environments like driving and surveillance.

Core claim

EAPFusion maintains a compact set of intrinsic priors that progressively update across scales; these evolved priors then drive prior-conditioned dynamic convolution to generate instance-adaptive kernels on the fly, shifting away from fixed pre-trained filters, while a channel-level fusion module interleaves and mixes infrared and visible features to enhance complementarity.

What carries the argument

Self-evolving intrinsic priors updated across scales, which condition dynamic convolution to produce instance-adaptive kernels, paired with channel shuffling and local mixing for cross-modal fusion.

If this is right

  • Fused outputs will more effectively highlight thermal targets while retaining fine visible textures in varied scenes.
  • Downstream tasks such as semantic segmentation will receive measurable performance improvements from the higher-quality inputs.
  • Cross-dataset generalization will hold because the priors adapt at inference rather than relying on dataset-specific training.
  • Reliance on large external pre-trained models for guidance can be reduced without sacrificing fusion quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The compact prior design may support deployment on edge devices where external model calls are costly.
  • The same evolving-prior mechanism could apply to other multi-modal fusion problems such as RGB-depth or medical imaging pairs.
  • Further scale variations in the prior update process might yield additional robustness to extreme lighting changes.

Load-bearing premise

A compact set of self-evolving intrinsic priors can capture sufficient scene-specific detail to replace external auxiliary models and resolve granularity mismatch while still generating better kernels than static weights.

What would settle it

On a new held-out dataset, if the method shows no gains over the strongest static baseline in both standard fusion metrics and downstream segmentation accuracy, the superiority of the evolving-prior approach would be refuted.

Figures

Figures reproduced from arXiv: 2605.01916 by Axi Niu, Haishen Wang, Luobin Zhang, Qingsen Yan, Zhenyu Sun.

Figure 1
Figure 1. Figure 1: Overall architecture of EAPFusion. EAPFusion consists of a dual-branch encoder, APG, DDCB, SCFB, and a progressive decoder for infrared-visible fusion. To better match the front-end encoder, we further design the Shuffle Channel Fusion Block. It shuffles and interleaves IR and VIS channels to facilitate thorough cross-modal interaction along the channel dimension, and then performs effective information in… view at source ↗
Figure 2
Figure 2. Figure 2: Adaptive Prior Generator (APG). Historical priors are aligned, summarized with current features via cross-attention, and updated by a gated evolution rule. 3.3. Prior-driven dynamic convolution block The set of auxiliary priors D i produced by the APG at each scale encapsulates key structural information and salient patterns. However, if they are merely concatenated or injected via simple feature addition,… view at source ↗
Figure 3
Figure 3. Figure 3: Prior-Driven Dynamic Convolution Block (DDCB). Each prior token generates an expert kernel via WeightGen, and dense/Top-K routing yields location-dependent mixing weights for dynamic convolution. 3.4. Shuffle channel fusion block At the same scale, IR and VIS features often encode complementary information in different channel subspaces: IR features emphasize target saliency and structural contours, while … view at source ↗
Figure 4
Figure 4. Figure 4: Shuffle Channel Fusion Block (SCFB). FiLM-based cross-modal gating, channel shuffling, local channel mixing, and multi￾branch interaction are followed by a 1 × 1 projection. 10 view at source ↗
Figure 5
Figure 5. Figure 5: Channel-wise mixing convolution (CWMC). A sliding-window 1D convolution along channels is folded and projected by 1×1 layers to obtain local channel interactions. 11 view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative fusion comparison on MSRS and M3FD. Columns show VIS, IR, and fused results of CDDFuse, EMMA, MaeFuse, SPDFusion, SwinFusion, T2EA, TDFusion, WaveFusion, and Ours; boxed regions are enlarged. 15 view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative segmentation comparison in daytime (top) and nighttime (bottom) scenes. Columns show source VIS/IR images and segmentation maps from different fusion methods; boxed regions highlight differences. VIS: visible; IR: infrared. As reported in view at source ↗
read the original abstract

Infrared-visible image fusion aims to create an information-rich fused image by integrating the complementary thermal saliency from infrared sensing and fine textures from visible imaging. Such accurate fusion is essential for real-world perception applications in complex scenes, including nighttime autonomous driving, search and rescue, and surveillance, and can further benefit downstream tasks such as semantic segmentation. However, most existing fusion methods rely upon static trained weights that cannot adapt to scene-specific content at inference time, and often suffer from a granularity mismatch when coarse auxiliary semantics are injected, which makes it difficult to simultaneously highlight targets and preserve details. In this work, we propose EAPFusion to address these issues by using self-evolving intrinsic priors instead of relying on external auxiliary models. Concretely, EAPFusion maintains a compact set of intrinsic priors and progressively updates them across scales. These evolved priors are utilized to dynamically generate convolutional kernels, shifting the paradigm from fixed, pre-trained filters to instance-adaptive parameters via prior-conditioned dynamic convolution. Furthermore, we design a channel-level fusion module that shuffles and interleaves infrared and visible channels, applying local channel mixing to boost cross-modal complementarity. Experiments on different datasets, including cross-dataset evaluation and semantic segmentation, show that the proposed method achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance. Code is coming soon.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes EAPFusion for infrared-visible image fusion. It replaces static trained weights and external auxiliary models with a compact set of self-evolving intrinsic priors that are progressively updated across scales; these priors condition dynamic convolution to produce instance-adaptive kernels. A channel-level fusion module that shuffles and interleaves infrared and visible channels is introduced to enhance cross-modal complementarity. Experiments on multiple datasets (including cross-dataset evaluation) and downstream semantic segmentation are claimed to demonstrate state-of-the-art quantitative and qualitative fusion results together with consistent gains on the downstream task.

Significance. If the experimental claims are substantiated, the shift from fixed weights to prior-conditioned dynamic kernels could meaningfully advance adaptive multimodal fusion for real-world perception tasks. The emphasis on intrinsic, evolving priors without external models addresses a recognized granularity mismatch and is a clear conceptual contribution. Reproducibility is supported by the stated intent to release code.

major comments (2)
  1. [Abstract] Abstract: the central claim that the method 'achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance' is unsupported by any numerical metrics, ablation tables, baseline comparisons, or error analysis in the provided text. This absence is load-bearing because the contribution is framed entirely around empirical superiority.
  2. [Method] Method description: the self-evolving intrinsic priors and their use to generate dynamic kernels are described at a high level without equations, update rules, or parameter counts. It is therefore impossible to verify whether the approach is truly parameter-free or avoids the circularity of fitting to the same data it claims to generalize over.
minor comments (2)
  1. The abstract states 'Code is coming soon' but provides no repository link or timeline; this should be updated with concrete availability information.
  2. Dataset names and the exact number of test sets used for cross-dataset evaluation should be stated explicitly in the abstract rather than left as 'different datasets'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments on our manuscript. We appreciate the positive assessment of the conceptual contribution regarding intrinsic evolving priors. We provide point-by-point responses to the major comments below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance' is unsupported by any numerical metrics, ablation tables, baseline comparisons, or error analysis in the provided text. This absence is load-bearing because the contribution is framed entirely around empirical superiority.

    Authors: The abstract is intended as a high-level summary of the paper's contributions and results. The full manuscript includes extensive experimental validation with quantitative metrics (e.g., PSNR, SSIM, LPIPS), ablation tables, comparisons against multiple baselines on several datasets, cross-dataset evaluations, and downstream semantic segmentation results with mIoU improvements. To strengthen the abstract and directly address this concern, we will revise it to include specific numerical highlights of the performance gains while maintaining conciseness. revision: yes

  2. Referee: [Method] Method description: the self-evolving intrinsic priors and their use to generate dynamic kernels are described at a high level without equations, update rules, or parameter counts. It is therefore impossible to verify whether the approach is truly parameter-free or avoids the circularity of fitting to the same data it claims to generalize over.

    Authors: We acknowledge that the method description in the current version may appear high-level. The manuscript details the self-evolving process with update rules for the intrinsic priors across multiple scales, the mathematical formulation of the prior-conditioned dynamic convolution (including equations for kernel generation and channel mixing), and the small fixed parameter count of the prior set. The evolution is performed in a self-supervised, inference-time manner based on the input image content without any retraining or access to labels, ensuring no circularity with the test data. We will expand the method section with additional equations, a clear parameter analysis, and pseudocode to make these aspects fully verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and method description introduce self-evolving intrinsic priors and dynamic kernel generation for image fusion, but present no equations, derivations, or load-bearing steps that reduce by construction to fitted parameters, self-definitions, or self-citation chains. Claims rest on experimental results across datasets rather than a closed logical loop. No specific reductions (e.g., prediction equaling input by definition) are identifiable from the supplied text, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract introduces 'intrinsic priors' and 'prior-conditioned dynamic convolution' without specifying numerical hyperparameters or external axioms; evaluation is limited to the high-level description.

invented entities (1)
  • self-evolving intrinsic priors no independent evidence
    purpose: to provide scene-adaptive guidance for generating dynamic convolutional kernels without external auxiliary models
    Presented as the core replacement for static weights and coarse external semantics

pith-pipeline@v0.9.0 · 5548 in / 1098 out tokens · 28250 ms · 2026-05-09T17:22:33.703941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 38 canonical work pages

  1. [1]

    K. Liu, M. Li, C. Chen, C. Rao, E. Zuo, Y . Wang, Z. Y an, B. Wang, C. Chen, X. Lv, Dsfusion: Infrared and visible image fusion method combining detail and scene information, Pattern Recognit. 154 (2024) 110633. https://doi.org/10.1016/j.patcog.2024.110633

  2. [2]

    X. Luo, J. Wang, Z. Zhang, X.-j. Wu, A full-scale hierarchical encoder-decoder network with cascading edge- prior for infrared and visible image fusion, Pattern Recognit. 148 (2024) 110192. https://doi.org/10. 1016/j.patcog.2023.110192

  3. [3]

    Z. Zhao, S. Xu, C. Zhang, J. Liu, J. Zhang, P . Li, DIDFuse: deep image decomposition for infrared and visible image fusion, in: Proc. 29th Int. Joint Conf. Artif. Intell. (IJCAI), 2020, pp. 970–976. https://doi.org/ 10.24963/ijcai.2020/135

  4. [5]

    Zheng, R

    B. Zheng, R. Wang, X. Liu, J. Li, A multi-level detection guided and co-encoding network for infrared and visible image fusion, Pattern Recognit. 168 (2025) 111778.https://doi.org/10.1016/j.patcog.2025. 111778. 19

  5. [7]

    Zhang, K

    J. Zhang, K. He, D. Xu, H. Shi, CLIP-based natural language-guided low-redundancy fusion of infrared and visible images, IEEE Trans. Consum. Electron. 71 (2025) 931–944. https://doi.org/10.1109/TCE. 2025.3526792

  6. [8]

    Z. Zhao, L. Deng, H. Bai, Y . Cui, Z. Zhang, Y . Zhang, H. Qin, D. Chen, J. Zhang, P . Wang, L. Van Gool, Image fusion via vision-language model, in: Proc. Int. Conf. Mach. Learn. (ICML), 2024, pp. 60749–60765. https://doi.org/10.5555/3692070.3694583

  7. [9]

    Li, X.-J

    H. Li, X.-J. Wu, DenseFuse: a fusion approach to infrared and visible images, IEEE Trans. Image Process. 28 (5) (2019) 2614–2623. https://doi.org/10.1109/TIP.2018.2887342

  8. [10]

    Li, X.-J

    H. Li, X.-J. Wu, J. Kittler, RFN-Nest: an end-to-end residual fusion network for infrared and visible images, Inf. Fusion 73 (2021) 72–86. https://doi.org/10.1016/j.inffus.2021.02.023

  9. [11]

    J. Ma, W. Yu, P . Liang, C. Li, J. Jiang, FusionGAN: a generative adversarial network for infrared and visible image fusion, Inf. Fusion 48 (2019) 11–26. https://doi.org/10.1016/j.inffus.2018.09.004

  10. [12]

    J. Li, H. Huo, C. Li, R. Wang, Q. Feng, AttentionFGAN: infrared and visible image fusion using attention- based generative adversarial networks, IEEE Trans. Multimed. 23 (2021) 1383–1396. https://doi.org/ 10.1109/TMM.2020.2997127

  11. [13]

    W. Tang, F. He, Y . Liu, ITFuse: an interactive transformer for infrared and visible image fusion, Pattern Recognit. 156 (2024) 110822. https://doi.org/10.1016/j.patcog.2024.110822

  12. [14]

    J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, Y . Ma, SwinFusion: cross-domain long-range learning for general image fusion via Swin Transformer, IEEE/CAA J. Autom. Sinica 9 (7) (2022) 1200–1217. https://doi. org/10.1109/JAS.2022.105686

  13. [15]

    Z. Zhao, H. Bai, Y . Zhu, J. Zhang, S. Xu, Y . Zhang, K. Zhang, D. Meng, R. Timofte, L. Van Gool, DDFM: denoising diffusion model for multi-modality image fusion, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 8048–8059. https://doi.org/10.1109/ICCV51070.2023.00742

  14. [16]

    H. Xu, R. Nie, J. Cao, M. Tan, Z. Ding, MADMFuse: a multi-attribute diffusion model to fuse infrared and visible images, Digit. Signal Process. 155 (2024) 104741. https://doi.org/10.1016/j.dsp.2024. 104741

  15. [17]

    J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, Z. Luo, Target-aware dual adversarial learning and a multi- scenario multi-modality benchmark to fuse infrared and visible for object detection, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 5792–5801. https://doi.org/10.1109/CVPR52688. 2022.00571. 20

  16. [18]

    H. Bai, J. Zhang, Z. Zhao, Y . Wu, L. Deng, Y . Cui, T. Feng, S. Xu, Task-driven image fusion with learnable fusion loss, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 7457–7468.https: //doi.org/10.1109/CVPR52734.2025.00699

  17. [19]

    J. Liu, B. Zhang, Q. Mei, X. Li, Y . Zou, Z. Jiang, L. Ma, R. Liu, X. Fan, DCEvo: discriminative cross- dimensional evolutionary learning for infrared and visible image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 2226–2235. https://doi.org/10.1109/CVPR52734.2025. 00213

  18. [20]

    X. Li, Y . Zou, J. Liu, Z. Jiang, L. Ma, X. Fan, R. Liu, From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion , arXiv (Jan. 2024). arXiv:2401.00421, https://doi.org/ 10.48550/arXiv.2401.00421. URL https://arxiv.org/abs/2401.00421

  19. [21]

    X. Li, J. Wang, W. Chen, R. Chen, G. Zhang, L. Cheng, Infrared and visible image fusion based on text-image core-semantic alignment and interaction, Digit. Signal Process. 163 (2025) 105203. https://doi.org/10. 1016/j.dsp.2025.105203

  20. [22]

    J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, B. Xu, PromptFusion: harmonized semantic prompt learning for infrared and visible image fusion, IEEE/CAA J. Autom. Sinica 12 (3) (2025) 502–515. https: //doi.org/10.1109/JAS.2024.124878

  21. [23]

    X. Yi, H. Xu, H. Zhang, L. Tang, J. Ma, Text-IF: leveraging semantic text guidance for degradation-aware and interactive image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 27016–27025. https://doi.org/10.1109/CVPR52733.2024.02552

  22. [24]

    G. Wu, H. Liu, H. Fu, Y . Peng, J. Liu, X. Fan, R. Liu, Every SAM drop counts: embracing semantic priors for multi-modality image fusion and beyond, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 17882–17891. https://doi.org/10.1109/CVPR52734.2025.01666

  23. [25]

    Cheng, T

    C. Cheng, T. Xu, X.-J. Wu, MUFusion: a general unsupervised image fusion network based on memory unit, Inf. Fusion 92 (2023) 80–92. https://doi.org/10.1016/j.inffus.2022.11.010

  24. [26]

    J. He, X. Luo, Z. Zhang, X.-j. Wu, MemoryFusion: a novel architecture for infrared and visible image fusion based on memory unit, Pattern Recognit. 170 (2026) 112004. https://doi.org/10.1016/j.patcog. 2025.112004

  25. [27]

    X. Yi, Y . Zhang, X. Xiang, Q. Y an, H. Xu, J. Ma, LUT-Fuse: towards extremely fast infrared and visible image fusion via distillation to learnable look-up tables, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 14559–14568

  26. [28]

    H. Li, Z. Y ang, Y . Zhang, W. Jia, Z. Yu, Y . Liu, MulFS-CAP: multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion, IEEE Trans. Pattern Anal. Mach. Intell. 47 (5) (2025) 3673–3690. https://doi.org/10.1109/TPAMI.2025.3535617. 21

  27. [29]

    Huang, S

    S. Huang, S. Su, J. Wei, L. Hu, Z. Cheng, Vector-quantized dual-branch fusion network for robust image fusion and anomaly suppression, Inf. Fusion 126 (2026) 103630. https://doi.org/10.1016/j.inffus. 2025.103630

  28. [30]

    L. Tang, H. Zhang, H. Xu, J. Ma, Rethinking the necessity of image fusion in high-level vision tasks: a practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity, Inf. Fusion 99 (2023) 101870. https://doi.org/10.1016/j.inffus.2023.101870

  29. [31]

    M. Lou, Y . Yu, OverLoCK: an overview-first-look-closely-next ConvNet with context-mixing dynamic ker- nels, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 128–138. https: //doi.org/10.1109/CVPR52734.2025.00021

  30. [32]

    Z. Wang, A. C. Bovik, H. R. Sheikh, E. P . Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612. https://doi.org/10.1109/ TIP.2003.819861

  31. [33]

    L. Tang, J. Yuan, H. Zhang, X. Jiang, J. Ma, Piafusion: A progressive infrared and visible image fusion network based on illumination aware, Inf. Fusion 83–84 (2022) 79–92. https://doi.org/10.1016/j. inffus.2022.03.007

  32. [34]

    A. Toet, M. A. Hogervorst, Progress in color night vision, Optical Engineering 51 (2012) 010901. https: //doi.org/10.1117/1.OE.51.1.010901

  33. [35]

    Z. Zhao, H. Bai, J. Zhang, Y . Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, L. Van Gool, Equivariant multi-modality image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 25912–25921. https://doi.org/10.1109/CVPR52733.2024.02448

  34. [36]

    Huang, C

    Z. Huang, C. Lin, B. Xu, M. Xia, Q. Li, Y . Li, N. Sang, T2EA: target-aware Taylor expansion approximation network for infrared and visible image fusion, IEEE Trans. Circuits Syst. Video Technol. 35 (5) (2025) 4831–

  35. [37]

    https://doi.org/10.1109/TCSVT.2024.3524794

  36. [38]

    Q. Wang, Z. Li, S. Zhang, N. Chi, Q. Dai, WaveFusion: a novel wavelet vision transformer with saliency- guided enhancement for multimodal image fusion, IEEE Trans. Circuits Syst. Video Technol. 35 (8) (2025) 7526–7542. https://doi.org/10.1109/TCSVT.2025.3549459

  37. [39]

    Z. Zhao, H. Bai, J. Zhang, Y . Zhang, S. Xu, Z. Lin, R. Timofte, L. Van Gool, Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 5906–5916. https://doi.org/10.1109/CVPR52729.2023.00572

  38. [40]

    J. Li, J. Jiang, P . Liang, J. Ma, L. Nie, MaeFuse: transferring omni features with pretrained masked au- toencoders for infrared and visible image fusion via guided training, IEEE Trans. Image Process. 34 (2025) 1340–1353. https://doi.org/10.1109/TIP.2025.3541562

  39. [41]

    Q. Xiao, H. Jin, H. Su, Y . Zhang, Z. Xiao, B. Wang, SPDFusion: a semantic prior knowledge-driven method for infrared and visible image fusion, IEEE Trans. Multimed. 27 (2025) 1691–1705. https://doi.org/10. 1109/TMM.2024.3521848. 22

  40. [42]

    L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Computer Vision – ECCV 2018, Vol. 11211, Springer, Cham, 2018, pp. 833–851. https://doi.org/10.1007/978-3-030-01234-2_49

  41. [43]

    Contributors, MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark, https: //github.com/open-mmlab/mmsegmentation (2020)

    M. Contributors, MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark, https: //github.com/open-mmlab/mmsegmentation (2020). 23