DualGate-Net: A Prior-Gated Dual-Encoder Framework for Histopathology Cell Detection

Atul Sajjanhar; Bahman Jafari Tabaghsar; K. Devaraja; Son Tran

arxiv: 2606.07222 · v1 · pith:JCCJPSVDnew · submitted 2026-06-05 · 💻 cs.CV · cs.AI

DualGate-Net: A Prior-Gated Dual-Encoder Framework for Histopathology Cell Detection

Bahman Jafari Tabaghsar , Son Tran , K. Devaraja , Atul Sajjanhar This is my paper

Pith reviewed 2026-06-27 22:38 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords histopathologycell detectiondual-encoderprior-gated fusiontissue contextOCELOT benchmarkauxiliary reconstructionConvNeXtV2

0 comments

The pith

DualGate-Net combines local and global encoders with learnable prior-gated fusion to adaptively incorporate tissue context for cell detection in histopathology images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DualGate-Net as a dual-encoder architecture that pairs a ConvNeXtV2 local encoder with a SegFormer global encoder. These are joined by a learnable fusion module that decides at each location how much tissue prior information to blend in, avoiding the fixed mixing used in earlier tissue-aware detectors. An auxiliary foreground reconstruction branch is added to retain fine cellular details during training, along with cellness-guided cues for better localization. On the OCELOT benchmark the approach records macro F1 scores of 0.7722 on validation and 0.7345 on test. A sympathetic reader would care because context-dependent cell classification is a recurring obstacle in pathology image analysis.

Core claim

DualGate-Net combines a ConvNeXtV2-based local encoder and a SegFormer-based global encoder through a learnable prior-gated fusion mechanism that adaptively regulates the influence of tissue priors across spatial locations. An auxiliary foreground reconstruction branch preserves high-frequency cellular structures during training, and auxiliary cellness-guided cues further improve localization robustness. Experiments on the OCELOT benchmark demonstrate consistent improvements, achieving macro F1-scores of 0.7722 on the validation set and 0.7345 on the test set.

What carries the argument

The learnable prior-gated fusion mechanism that adaptively regulates the influence of tissue priors across spatial locations.

If this is right

Adaptive per-location regulation of priors reduces noise propagation relative to static fusion strategies.
The auxiliary foreground reconstruction branch maintains high-frequency cellular structures that would otherwise be lost.
Cellness-guided cues add localization robustness on top of the gated fusion.
The reported macro F1 scores of 0.7722 validation and 0.7345 test represent measurable gains on the OCELOT benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same learnable gating pattern could be inserted into other dual-stream medical imaging models that currently use fixed context fusion.
Evaluating the framework on additional histopathology cohorts with different staining protocols would test whether the adaptive regulation generalizes beyond OCELOT.
The spatial maps produced by the gate itself could be inspected to identify which tissue microenvironments most strongly influence particular cell classes.

Load-bearing premise

The learnable prior-gated fusion module will adaptively regulate tissue-prior influence across locations without propagating noise, and the auxiliary foreground reconstruction branch will reliably preserve high-frequency cellular structures.

What would settle it

Running an ablation on the OCELOT test set that removes the learnable gate or the auxiliary reconstruction branch and shows no drop in macro F1, or visual inspection of fused feature maps that reveals increased noise, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.07222 by Atul Sajjanhar, Bahman Jafari Tabaghsar, K. Devaraja, Son Tran.

**Figure 2.** Figure 2: Prior-gated fusion module. B(1) estimates a spatial reliability gate from [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of the auxiliary foreground reconstruction branch used dur [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Example visualization from the OCELOT benchmark. (a) Ground-truth [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Cell detection in histopathology images strongly depends on surrounding tissue context, where visually similar cells may belong to different classes under different microenvironments. Recent tissue-aware methods incorporate contextual priors, but often rely on static fusion strategies that may propagate noisy information. In this work, we propose DualGate-Net, a prior-aware dual-encoder framework that combines a ConvNeXtV2-based local encoder and a SegFormer-based global encoder through a learnable prior-gated fusion mechanism. The proposed module adaptively regulates the influence of tissue priors across spatial locations, while an auxiliary foreground reconstruction branch preserves high-frequency cellular structures during training. In addition, auxiliary cellness-guided cues are incorporated to further improve localization robustness. Experiments on the OCELOT benchmark demonstrate consistent improvements, achieving macro F1-scores of 0.7722 on the validation set and 0.7345 on the test set, highlighting the effectiveness of adaptive prior integration for robust histopathology cell detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DualGate-Net pairs two standard encoders with a gated fusion and auxiliary branches but the abstract gives no ablations or baselines to show those pieces matter.

read the letter

The main point is that this paper puts forward a dual-encoder setup for histopathology cell detection that fuses a ConvNeXtV2 local branch and a SegFormer global branch through a learnable prior-gated module, plus auxiliary reconstruction and cellness branches. It reports macro F1 of 0.7722 on validation and 0.7345 on test for the OCELOT benchmark.

What is actually new is the concrete combination of those two backbones with the gated fusion described as adaptive per location. The auxiliary foreground reconstruction is meant to keep high-frequency cell details while the gate is supposed to limit noisy tissue priors. The abstract frames this as an improvement over static fusion approaches, which is a fair motivation.

The paper does a reasonable job laying out the clinical context where surrounding tissue affects cell class and stating the architecture clearly enough that someone could reimplement the high-level idea.

The soft spots are the missing pieces that make the numbers hard to interpret. There are no baseline comparisons to prior tissue-aware detectors, no ablation tables removing the gate or the auxiliary branch, no error bars, and no visualizations or frequency metrics to check if the gate actually suppresses noise or the reconstruction branch preserves details. The stress-test note is on target here: without those checks it is difficult to credit the F1 scores to the proposed mechanisms rather than backbone choice or training details. The circularity burden is low since the results are empirical, but the soundness is limited by the lack of protocol details.

This work is for people already working on cell detection in medical images who might want to test a gated fusion variant. A reader in that narrow area could get a usable architecture sketch, but it is unlikely to interest broader computer vision or change practice without stronger evidence.

I would recommend sending it for peer review once the authors add the ablations, comparisons, and mechanism checks; the core proposal is simple enough that referees could assess it directly.

Referee Report

3 major / 0 minor

Summary. The paper proposes DualGate-Net, a dual-encoder architecture combining a ConvNeXtV2 local encoder and SegFormer global encoder via a learnable prior-gated fusion module, plus an auxiliary foreground reconstruction branch and cellness-guided cues, for context-aware cell detection in histopathology images. It reports macro F1 scores of 0.7722 on the OCELOT validation set and 0.7345 on the test set, attributing gains to adaptive regulation of tissue priors and preservation of high-frequency structures.

Significance. If the empirical gains can be rigorously attributed to the proposed mechanisms rather than training choices or backbone, the work addresses a practical need for robust tissue-context integration in computational pathology. The absence of ablations, baselines, and mechanism-specific diagnostics in the current presentation prevents assessing whether the adaptive fusion and auxiliary branch deliver the claimed benefits.

major comments (3)

[Abstract] Abstract: The reported macro F1 scores of 0.7722/0.7345 are presented without baseline comparisons, statistical tests, error bars, ablation studies, or details on dataset splits and training protocol. This makes it impossible to determine whether improvements stem from the learnable prior-gated fusion or auxiliary branch rather than other factors.
[Abstract] Abstract (framework description): The central claim that the learnable prior-gated fusion 'adaptively regulates the influence of tissue priors across spatial locations' without propagating noise lacks supporting evidence such as gating visualizations, attention maps, or noise-injection ablations. Without these, attribution of the F1 gains to this module cannot be verified.
[Abstract] Abstract (framework description): The auxiliary foreground reconstruction branch is asserted to 'preserve high-frequency cellular structures,' yet no frequency-domain metrics, reconstruction error analysis, or targeted ablations are reported to confirm this behavior or its contribution to detection performance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the evidence supporting our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The reported macro F1 scores of 0.7722/0.7345 are presented without baseline comparisons, statistical tests, error bars, ablation studies, or details on dataset splits and training protocol. This makes it impossible to determine whether improvements stem from the learnable prior-gated fusion or auxiliary branch rather than other factors.

Authors: We agree that the abstract is too concise to fully contextualize the results. The full manuscript contains baseline comparisons, dataset details, and training protocol in the experiments section. To address the concern directly, we will revise the abstract to briefly reference these elements and add error bars plus statistical significance testing to the reported F1 scores in the results. revision: yes
Referee: [Abstract] Abstract (framework description): The central claim that the learnable prior-gated fusion 'adaptively regulates the influence of tissue priors across spatial locations' without propagating noise lacks supporting evidence such as gating visualizations, attention maps, or noise-injection ablations. Without these, attribution of the F1 gains to this module cannot be verified.

Authors: We acknowledge that the current manuscript lacks mechanism-specific diagnostics for the prior-gated fusion. We will add gating visualizations, attention maps, and noise-injection ablations in the revised version to provide direct evidence for the adaptive regulation claim and its contribution to the observed performance. revision: yes
Referee: [Abstract] Abstract (framework description): The auxiliary foreground reconstruction branch is asserted to 'preserve high-frequency cellular structures,' yet no frequency-domain metrics, reconstruction error analysis, or targeted ablations are reported to confirm this behavior or its contribution to detection performance.

Authors: We agree that additional targeted analysis is needed to substantiate the auxiliary branch claim. In the revision, we will incorporate frequency-domain metrics, reconstruction error analysis, and ablations isolating the branch to confirm its effect on high-frequency structures and detection performance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on external benchmark with no self-referential derivations

full rationale

The paper reports macro F1 scores of 0.7722/0.7345 on the OCELOT benchmark after describing a dual-encoder architecture with learnable gated fusion and auxiliary branches. No equations, fitted parameters, or derivation steps appear in the abstract or described framework that reduce any claimed output to an input by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no predictions are statistically forced from subsets of the same data. The performance numbers are standard empirical evaluations on an independent external dataset, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The gated fusion and auxiliary branches are architectural choices whose effectiveness is asserted empirically.

pith-pipeline@v0.9.1-grok · 5710 in / 1041 out tokens · 20238 ms · 2026-06-27T22:38:31.946038+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 canonical work pages

[1]

The Journal of pathol- ogy249(3), 286–294 (2019)

Abels, E., Pantanowitz, L., Aeffner, F., Zarella, M.D., Van der Laak, J., Bui, M.M., Vemuri, V.N., Parwani, A.V., Gibbs, J., Agosto-Arroyo, E., et al.: Computational pathology definitions, best practices, and recommendations for regulatory guid- ance: a white paper from the digital pathology association. The Journal of pathol- ogy249(3), 286–294 (2019)

2019
[2]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Abousamra, S., Belinsky, D., Van Arnam, J., Allard, F., Yee, E., Gupta, R., Kurc, T., Samaras, D., Saltz, J., Chen, C.: Multi-class cell detection using spatial con- text representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4005–4014 (2021)

2021
[3]

arXiv preprint arXiv:2003.10778 (2020)

Gamper, J., Koohbanani, N.A., Benes, K., Graham, S., Jahanifar, M., Khurram, S.A., Azam, A., Hewitt, K., Rajpoot, N.: Pannuke dataset extension, insights and baselines. arXiv preprint arXiv:2003.10778 (2020)

work page arXiv 2003
[4]

Medical image analysis58, 101563 (2019)

Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nuclei in multi- tissue histology images. Medical image analysis58, 101563 (2019)

2019
[5]

In: International Conference on Medical Image Comput- ing and Computer-Assisted Intervention

Ha, S.M., Ko, Y.S., Park, Y.: Generating blobcell label from weak annotations for precise cell segmentation. In: International Conference on Medical Image Comput- ing and Computer-Assisted Intervention. pp. 161–170. Springer (2023)

2023
[6]

Medical image analysis94, 103143 (2024)

Hörst, F., Rempe, M., Heine, L., Seibold, C., Keyl, J., Baldini, G., Ugurel, S., Siveke, J., Grünwald, B., Egger, J., et al.: Cellvit: Vision transformers for precise cell segmentation and classification. Medical image analysis94, 103143 (2024)

2024
[7]

IEEE transactions on medical imaging39(5), 1380–1391 (2019)

Kumar, N., Verma, R., Anand, D., Zhou, Y., Onder, O.F., Tsougenis, E., Chen, H., Heng, P.A., Li, J., Hu, Z., et al.: A multi-organ nucleus segmentation challenge. IEEE transactions on medical imaging39(5), 1380–1391 (2019)

2019
[8]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Lafarge, M.W., Koelzer, V.H.: Detecting cells in histopathology images with a resnet ensemble model. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 123–129. Springer (2023)

2023
[9]

In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention

Lo, Y.W., Yang, C.H.: Enhancing cell detection via fc-hardnet and tissue seg- mentation: Ocelot 2023 challenge approach. In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention. pp. 130–137. Springer (2023)

2023
[10]

Millward, J., He, Z., Nibali, A.: Dense prediction of cell centroids using tissue con- textandcellrefinement.In:InternationalConferenceonMedicalImageComputing and Computer-Assisted Intervention. pp. 138–149. Springer (2023) DualGate-Net for Cell Detection 15

2023
[11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ryu, J., Puche, A.V., Shin, J., Park, S., Brattoli, B., Lee, J., Jung, W., Cho, S.I., Paeng, K., Ock, C.Y., et al.: Ocelot: Overlapped cell on tissue dataset for histopathology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23902–23912 (2023)

2023
[12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Schoenpflug, L.A., Koelzer, V.H.: Softctm: cell detection by soft instance segmen- tation and consideration of cell-tissue interaction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 109–122. Springer (2023)

2023
[13]

GigaScience 14, giaf011 (2025)

Schuiveling, M., Liu, H., Eek, D., Breimer, G.E., Suijkerbuijk, K.P., Blokx, W.A., Veta, M.: A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks. GigaScience 14, giaf011 (2025)

2025
[14]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Shui, Z., Li, H., Zhang, Y., Sun, Y., Ye, Y., Chen, P., Guo, R., Cui, L., Zhu, C., Yang, L.: Towards effective and efficient context-aware nucleus detection in histopathology whole slide images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 9042–9050 (2026)

2026
[15]

arXiv preprint arXiv:2510.20754 (2025)

Torbati, N., Meshcheryakova, A., Woitek, R., Mechtcheriakova, D., Mahbod, A.: Acs-segnet: An attention-based cnn-segformer segmentation network for tissue seg- mentation in histopathology. arXiv preprint arXiv:2510.20754 (2025)

work page arXiv 2025
[16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16133– 16142 (2023)

2023
[17]

PeerJ11, e15408 (2023)

Wu, Y., Liu, X., Liu, F., Li, Y., Xiong, X., Sun, H., Lin, B., Li, Y., Xu, B.: A multi- class classification algorithm based on hematoxylin-eosin staining for neoadjuvant therapy in rectal cancer: a retrospective study. PeerJ11, e15408 (2023)

2023
[18]

Advances in neural information processing systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems34, 12077–12090 (2021)

2021

[1] [1]

The Journal of pathol- ogy249(3), 286–294 (2019)

Abels, E., Pantanowitz, L., Aeffner, F., Zarella, M.D., Van der Laak, J., Bui, M.M., Vemuri, V.N., Parwani, A.V., Gibbs, J., Agosto-Arroyo, E., et al.: Computational pathology definitions, best practices, and recommendations for regulatory guid- ance: a white paper from the digital pathology association. The Journal of pathol- ogy249(3), 286–294 (2019)

2019

[2] [2]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Abousamra, S., Belinsky, D., Van Arnam, J., Allard, F., Yee, E., Gupta, R., Kurc, T., Samaras, D., Saltz, J., Chen, C.: Multi-class cell detection using spatial con- text representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4005–4014 (2021)

2021

[3] [3]

arXiv preprint arXiv:2003.10778 (2020)

Gamper, J., Koohbanani, N.A., Benes, K., Graham, S., Jahanifar, M., Khurram, S.A., Azam, A., Hewitt, K., Rajpoot, N.: Pannuke dataset extension, insights and baselines. arXiv preprint arXiv:2003.10778 (2020)

work page arXiv 2003

[4] [4]

Medical image analysis58, 101563 (2019)

Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nuclei in multi- tissue histology images. Medical image analysis58, 101563 (2019)

2019

[5] [5]

In: International Conference on Medical Image Comput- ing and Computer-Assisted Intervention

Ha, S.M., Ko, Y.S., Park, Y.: Generating blobcell label from weak annotations for precise cell segmentation. In: International Conference on Medical Image Comput- ing and Computer-Assisted Intervention. pp. 161–170. Springer (2023)

2023

[6] [6]

Medical image analysis94, 103143 (2024)

Hörst, F., Rempe, M., Heine, L., Seibold, C., Keyl, J., Baldini, G., Ugurel, S., Siveke, J., Grünwald, B., Egger, J., et al.: Cellvit: Vision transformers for precise cell segmentation and classification. Medical image analysis94, 103143 (2024)

2024

[7] [7]

IEEE transactions on medical imaging39(5), 1380–1391 (2019)

Kumar, N., Verma, R., Anand, D., Zhou, Y., Onder, O.F., Tsougenis, E., Chen, H., Heng, P.A., Li, J., Hu, Z., et al.: A multi-organ nucleus segmentation challenge. IEEE transactions on medical imaging39(5), 1380–1391 (2019)

2019

[8] [8]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Lafarge, M.W., Koelzer, V.H.: Detecting cells in histopathology images with a resnet ensemble model. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 123–129. Springer (2023)

2023

[9] [9]

In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention

Lo, Y.W., Yang, C.H.: Enhancing cell detection via fc-hardnet and tissue seg- mentation: Ocelot 2023 challenge approach. In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention. pp. 130–137. Springer (2023)

2023

[10] [10]

Millward, J., He, Z., Nibali, A.: Dense prediction of cell centroids using tissue con- textandcellrefinement.In:InternationalConferenceonMedicalImageComputing and Computer-Assisted Intervention. pp. 138–149. Springer (2023) DualGate-Net for Cell Detection 15

2023

[11] [11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ryu, J., Puche, A.V., Shin, J., Park, S., Brattoli, B., Lee, J., Jung, W., Cho, S.I., Paeng, K., Ock, C.Y., et al.: Ocelot: Overlapped cell on tissue dataset for histopathology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23902–23912 (2023)

2023

[12] [12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Schoenpflug, L.A., Koelzer, V.H.: Softctm: cell detection by soft instance segmen- tation and consideration of cell-tissue interaction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 109–122. Springer (2023)

2023

[13] [13]

GigaScience 14, giaf011 (2025)

Schuiveling, M., Liu, H., Eek, D., Breimer, G.E., Suijkerbuijk, K.P., Blokx, W.A., Veta, M.: A novel dataset for nuclei and tissue segmentation in melanoma with baseline nuclei segmentation and tissue segmentation benchmarks. GigaScience 14, giaf011 (2025)

2025

[14] [14]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Shui, Z., Li, H., Zhang, Y., Sun, Y., Ye, Y., Chen, P., Guo, R., Cui, L., Zhu, C., Yang, L.: Towards effective and efficient context-aware nucleus detection in histopathology whole slide images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 9042–9050 (2026)

2026

[15] [15]

arXiv preprint arXiv:2510.20754 (2025)

Torbati, N., Meshcheryakova, A., Woitek, R., Mechtcheriakova, D., Mahbod, A.: Acs-segnet: An attention-based cnn-segformer segmentation network for tissue seg- mentation in histopathology. arXiv preprint arXiv:2510.20754 (2025)

work page arXiv 2025

[16] [16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16133– 16142 (2023)

2023

[17] [17]

PeerJ11, e15408 (2023)

Wu, Y., Liu, X., Liu, F., Li, Y., Xiong, X., Sun, H., Lin, B., Li, Y., Xu, B.: A multi- class classification algorithm based on hematoxylin-eosin staining for neoadjuvant therapy in rectal cancer: a retrospective study. PeerJ11, e15408 (2023)

2023

[18] [18]

Advances in neural information processing systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems34, 12077–12090 (2021)

2021