pith. machine review for the scientific record. sign in

arxiv: 2605.11434 · v1 · submitted 2026-05-12 · 📡 eess.IV

Recognition: 3 theorem links

· Lean Theorem

FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation

Jin Yang, Peijie Qiu, Xiaobing Yu

Pith reviewed 2026-05-13 00:45 UTC · model grok-4.3

classification 📡 eess.IV
keywords Vision TransformerMedical Image SegmentationVolumetric SegmentationFrequency DomainFeature FusionSelf-AttentionWavelet Transform3D Medical Imaging
0
0 comments X

The pith

FEFormer adds frequency modeling to Vision Transformers to capture fine local details and fuse features consistently for volumetric medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FEFormer, a modified Vision Transformer that incorporates frequency-domain operations to handle both broad context and precise anatomical structures in 3D medical scans. Standard transformers often miss small local patterns and struggle to blend encoder and decoder information without losing meaning, problems the authors target with four new modules. A sympathetic reader would care because better segmentation directly supports diagnosis and treatment planning from CT or MRI volumes. The approach is tested across four different segmentation tasks and reported to exceed prior methods while remaining computationally light.

Core claim

FEFormer jointly models global dependencies and fine-grained local features through frequency-enhanced components: FDSA uses locality-preserving convolution with frequency attention, FGMLP decomposes and gates low- and high-frequency signals, WAFF performs wavelet-guided adaptive fusion across encoder-decoder stages, and FCSB propagates low-level details via frequency-enabled cross-scale bridging. These mechanisms address the limitations of plain self-attention, spatial-agnostic MLPs, naive fusion, and missing low-level pathways, yielding improved segmentation accuracy and efficiency on volumetric data.

What carries the argument

The FEFormer architecture, whose four frequency-based modules (FDSA for attention, FGMLP for gating, WAFF for fusion, FCSB for bridging) explicitly process frequency information to preserve both global context and local structural details during encoding and decoding.

If this is right

  • Superior segmentation accuracy is obtained across four diverse volumetric medical tasks while maintaining lower computational cost than prior transformers.
  • FDSA enables joint capture of fine local details and long-range dependencies inside the attention block.
  • WAFF produces semantically consistent features when merging encoder and decoder representations.
  • FGMLP and FCSB together improve representation of low- and high-frequency content and low-level information flow.
  • The overall design supports generic knowledge extraction and adaptive fusion without task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The frequency-centric design might transfer to other dense prediction tasks such as 3D object detection in medical volumes or non-medical volumetric data like seismic imaging.
  • If the modules prove modular, they could be inserted into existing transformer backbones with minimal retraining to test gains on new datasets.
  • Explicit frequency decomposition suggests possible extensions to multi-modal fusion where one modality supplies low-frequency priors.
  • The efficiency claims imply the method could scale to higher-resolution scans or real-time clinical workflows once validated on larger cohorts.

Load-bearing premise

The four new modules actually overcome the stated shortcomings of ordinary Vision Transformers without needing extra derivations or detailed ablation evidence to confirm each one works as claimed.

What would settle it

If independent runs on the same four volumetric segmentation benchmarks show FEFormer matching or falling below current state-of-the-art accuracy or speed, or if ablating any single frequency module leaves performance unchanged, the central performance claim would not hold.

Figures

Figures reproduced from arXiv: 2605.11434 by Jin Yang, Peijie Qiu, Xiaobing Yu.

Figure 1
Figure 1. Figure 1: Visualization of low-frequency and high-frequency components decomposed by 3D Fast Fourier Transformation on (A) CT volumes and (B) T1w MR volumes. interactions among tokenized image patches (Dosovitskiy et al., 2021). While this global modeling capability is effective for capturing contextual information, it may come at the cost of diminished sensitivity to fine-grained spatial details (Ren et al., 2022; … view at source ↗
Figure 2
Figure 2. Figure 2: (A) The overall architecture of FEFormer. FEFormer consists of an encoder, a bottleneck, a decoder, and a Frequency￾enabled Cross-scale Stem Bridge (FCSB). The encoder employs a convolutional stem, and subsequently employs two consecutive Frequency-enhanced Transformer blocks and a patch merging layer at each stage. The decoder with a symmetry architecture employs a patch expanding layer and two Transforme… view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of the Wavelet-guided Adaptive Feature Fusion (WAFF) module. WAFF takes input spatial features 𝑿1 and 𝑿2 , and employs DWT to decompose them into frequency-domain sub-bands: {𝑿 𝐿𝐿𝐿 1 , ..., 𝑿 𝐻𝐻𝐿 1 , 𝑿 𝐻𝐻𝐻 1 } and {𝑿 𝐿𝐿𝐿 2 , ..., 𝑿 𝐻𝐻𝐿 2 , 𝑿 𝐻𝐻𝐻 2 }. Subsequently, an adaptive feature fusion mechanism is employed to fuse each corresponding sub-bands. All fused sub-bands {𝑿 𝐿𝐿𝐿 , ..., 𝑿 𝐻𝐻𝐿 … view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison between FEFormer and (c) Att UNet, (d) nnU-Net, (e) nnFormer, (f) SegFormer, (g) Swin UNETR, (h) MedNext, and (i) VSmTrans. Results are shown across three public datasets, including the AMOS Abdominal Multi￾organ dataset, the Hepatic Vessel Tumor dataset, and the Brain Tumor dataset. Red boxes mark the regions where FEFormer demonstrates better segmentation results than other methods… view at source ↗
Figure 5
Figure 5. Figure 5: Road-map visualization of the cumulative impact of the proposed modules on segmentation performance and model complexity in the 2022 AMOS multi-organ segmentation task. The progressive incorporation of the FDSA, FGMLP, WAFF, and FCSB into the plain ViT architecture consistently improved the mean DSC from 84.08% to 90.11%. Meanwhile, the corresponding changes in computational cost demonstrated that the prop… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of segmentation performance and generalization gaps between FEFormer and SOTA methods on the FLARE dataset for internal and external evaluation settings. Internal and external performance of all models are shown in DSC and HD95, along with the corresponding generalization gap. FEFormer consistently achieved superior segmentation accuracy (higher DSC and lower HD95) while exhibiting a notably sma… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison between (i) FEFormer and (c) Att UNet, (d) nnU-Net, (e) nnFormer, (f) SegFormer, (g) Swin UNETR, and (h) MedNext on the external evaluation on the FLARE dataset. Red boxes mark the regions where FEFormer demonstrates better segmentation results than other methods. of segmentation performance between internal and external evaluations were evaluated ( [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of components decomposed by 3D Discrete Wavelet Transformation on (A) CT volumes from the AMOS dataset and (B) T1w MR volumes from the BraTS dataset [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Accurate segmentation of organs and lesions in medical images is essential for clinical applications including diagnosis, prognosis, and treatment planning. While Vision Transformers (ViTs) have shown impressive segmentation performance, they face key challenges in module and architecture design. Specifically, self-attention struggles to capture fine-grained local features critical for understanding detailed anatomical structures, standard MLP modules lack explicit mechanisms to preserve spatial information, conventional encoder-decoder architectures rely on naive feature fusion strategies that cannot handle large semantic discrepancies, and existing designs lack explicit mechanisms to propagate low-level information from encoder to decoder. To address these limitations, we propose a Frequency-enhanced Vision Transformer (FEFormer) for robust and efficient volumetric medical image segmentation that explicitly models frequency information to jointly capture global context and fine structural details. FEFormer comprises four novel components: a Frequency-enhanced Dynamic Self-Attention (FDSA) module that jointly captures fine-grained local details and global long-range dependencies through locality-preserving convolution with frequency-domain attention; a Frequency-decomposed Gating MLP (FGMLP) that adaptively models low- and high-frequency components for enhanced semantic and structural representation; a Wavelet-guided Adaptive Feature Fusion (WAFF) module that enables semantically consistent encoder-decoder feature integration in the frequency domain; and a Frequency-enabled Cross-scale Stem Bridge (FCSB) that enhances low-level feature propagation across scales. Evaluated on four diverse volumetric medical image segmentation tasks, FEFormer achieved superior segmentation performance with high computational efficiency compared to state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The manuscript proposes FEFormer, a frequency-enhanced Vision Transformer for volumetric medical image segmentation. It introduces four modules—Frequency-enhanced Dynamic Self-Attention (FDSA) using locality-preserving convolution with frequency-domain attention, Frequency-decomposed Gating MLP (FGMLP) for adaptive low/high-frequency modeling, Wavelet-guided Adaptive Feature Fusion (WAFF) for semantically consistent encoder-decoder integration, and Frequency-enabled Cross-scale Stem Bridge (FCSB) for low-level feature propagation—to address limitations of standard ViTs in local feature capture and fusion. The work evaluates the model on four diverse volumetric segmentation tasks and claims superior performance alongside high computational efficiency versus state-of-the-art methods.

Significance. If the reported gains hold, the work offers a targeted advance in medical image segmentation by embedding explicit frequency-domain mechanisms into ViT designs, improving both fine-grained anatomical detail capture and encoder-decoder consistency while maintaining efficiency. Strengths include the concrete module definitions with frequency operations and the provision of quantitative efficiency metrics (FLOPs, parameters, inference time) alongside accuracy tables. This could inform subsequent architectures handling semantic gaps in volumetric tasks.

minor comments (4)
  1. §3.2 (FDSA module): the interaction between the convolution branch and frequency attention is described in text but would benefit from an explicit equation or pseudocode to clarify how locality is preserved while computing attention weights.
  2. Table 1 (main results): report standard deviations across runs or statistical significance tests for the Dice/HD95 improvements to strengthen the superiority claim over baselines.
  3. §4.3 (ablation study): the contribution of each module is shown via incremental addition, but the interaction effects between WAFF and FCSB are not isolated; a fuller factorial ablation would clarify independence.
  4. Figure 3 (qualitative results): the caption and legend should explicitly state the color mapping for ground-truth versus prediction overlays to aid reader interpretation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. The report accurately captures the core contributions of FEFormer, including the four frequency-aware modules and the evaluation across four volumetric tasks. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an architectural proposal consisting of four frequency-enhanced modules (FDSA, FGMLP, WAFF, FCSB) to address stated limitations of standard ViTs. These modules are introduced via descriptive text in the abstract with no accompanying equations, derivations, or first-principles reductions. No predictions are claimed that reduce by construction to fitted inputs, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is presented as a derivation. The central claims rest on empirical segmentation performance across four tasks, which is independent of any internal definitional loop. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 4 invented entities

The central claim rests on the unproven effectiveness of four newly introduced modules whose internal frequency-domain mechanisms are described only at the level of high-level purpose statements in the abstract.

invented entities (4)
  • Frequency-enhanced Dynamic Self-Attention (FDSA) no independent evidence
    purpose: Jointly capture fine-grained local details and global long-range dependencies via locality-preserving convolution with frequency-domain attention
    New module postulated in the abstract without external validation or derivation
  • Frequency-decomposed Gating MLP (FGMLP) no independent evidence
    purpose: Adaptively model low- and high-frequency components for enhanced semantic and structural representation
    New module postulated in the abstract without external validation or derivation
  • Wavelet-guided Adaptive Feature Fusion (WAFF) no independent evidence
    purpose: Enable semantically consistent encoder-decoder feature integration in the frequency domain
    New module postulated in the abstract without external validation or derivation
  • Frequency-enabled Cross-scale Stem Bridge (FCSB) no independent evidence
    purpose: Enhance low-level feature propagation across scales
    New module postulated in the abstract without external validation or derivation

pith-pipeline@v0.9.0 · 5576 in / 1464 out tokens · 47547 ms · 2026-05-13T00:45:48.580765+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

  1. [1]

    Nature communications , volume=

    Annotation-efficient deep learning for automatic medical image segmentation , author=. Nature communications , volume=. 2021 , publisher=

  2. [2]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Medical image segmentation review: The success of u-net , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

  3. [3]

    International Conference on Medical image computing and computer-assisted intervention , pages=

    U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

  4. [4]

    IEEE transactions on medical imaging , volume=

    H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes , author=. IEEE transactions on medical imaging , volume=. 2018 , publisher=

  5. [5]

    Medical Imaging 2025: Computer-Aided Diagnosis , volume=

    Dynamic U-Net: adaptively calibrate features for abdominal multiorgan segmentation , author=. Medical Imaging 2025: Computer-Aided Diagnosis , volume=. 2025 , organization=

  6. [6]

    ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    D2-mlp: dynamic decomposed mlp mixer for medical image segmentation , author=. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2025 , organization=

  7. [7]

    Biomedical Signal Processing and Control , volume=

    DMC-Net: Lightweight Dynamic Multi-scale and Multi-resolution convolution network for pancreas segmentation in CT images , author=. Biomedical Signal Processing and Control , volume=. 2025 , publisher=

  8. [8]

    arXiv preprint arXiv:2511.17873 , year=

    TransLK-Net: Entangling Transformer and Large Kernel for Progressive and Collaborative Feature Encoding and Decoding in Medical Image Segmentation , author=. arXiv preprint arXiv:2511.17873 , year=

  9. [9]

    Biomedical Signal Processing and Control , volume=

    D-net: Dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation , author=. Biomedical Signal Processing and Control , volume=. 2026 , publisher=

  10. [10]

    International Conference on Learning Representations , year=

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

  11. [11]

    Advances in Neural Information Processing Systems , volume=

    Class-aware adversarial transformers for medical image segmentation , author=. Advances in Neural Information Processing Systems , volume=

  12. [12]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    After-unet: Axial fusion transformer unet for medical image segmentation , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

  13. [13]

    Physics in Medicine & Biology , volume=

    CTA-UNet: CNN-transformer architecture UNet for dental CBCT images segmentation , author=. Physics in Medicine & Biology , volume=. 2023 , publisher=

  14. [14]

    International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

    iSegFormer: interactive segmentation via transformers with application to 3D knee MR images , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2022 , organization=

  15. [15]

    Radiology: Artificial Intelligence , volume=

    Optimizing performance of transformer-based models for fetal brain MR image segmentation , author=. Radiology: Artificial Intelligence , volume=. 2024 , publisher=

  16. [16]

    Medical physics , volume=

    SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images , author=. Medical physics , volume=. 2024 , publisher=

  17. [17]

    Medical image analysis , volume=

    FAT-Net: Feature adaptive transformers for automated skin lesion segmentation , author=. Medical image analysis , volume=. 2022 , publisher=

  18. [18]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Adaptive template transformer for mitochondria segmentation in electron microscopy images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  19. [19]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Shunted self-attention via multi-scale token aggregation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  20. [20]

    IEEE Transactions on Image Processing , volume=

    S2AFormer: Strip Self-Attention for Efficient Vision Transformer , author=. IEEE Transactions on Image Processing , volume=. 2025 , publisher=

  21. [21]

    Information Fusion , volume=

    Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation , author=. Information Fusion , volume=. 2025 , publisher=

  22. [22]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    PHA: Patch-wise high-frequency augmentation for transformer-based person re-identification , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  23. [23]

    International conference on machine learning , pages=

    Global context vision transformers , author=. International conference on machine learning , pages=. 2023 , organization=

  24. [24]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Lightweight vision transformer with spatial and channel enhanced self-attention , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  25. [25]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  26. [26]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Low-resolution self-attention for semantic segmentation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  27. [27]

    Information Fusion , pages=

    SU-RMT: Toward Bridging Semantic Representation and Structural Detail Modeling for Medical Image Segmentation , author=. Information Fusion , pages=. 2026 , publisher=

  28. [28]

    European conference on computer vision , pages=

    Improving vision transformers by revisiting high-frequency components , author=. European conference on computer vision , pages=. 2022 , organization=

  29. [29]

    The Fourteenth International Conference on Learning Representations , year=

    GmNet: Revisiting Gating Mechanisms From A Frequency View , author=. The Fourteenth International Conference on Learning Representations , year=

  30. [30]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Squeeze-and-excitation networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  31. [31]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    ECA-Net: Efficient channel attention for deep convolutional neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  32. [32]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Squeeze-and-attention networks for semantic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  33. [33]

    International conference on medical image computing and computer-assisted intervention , pages=

    3D U-Net: learning dense volumetric segmentation from sparse annotation , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2016 , organization=

  34. [34]

    Expert Systems with Applications , volume=

    MSM-UNet: a medical image segmentation method based on wavelet transform and multi-scale Mamba-UNet , author=. Expert Systems with Applications , volume=. 2025 , publisher=

  35. [35]

    Neural Networks , pages=

    X-UNet: A novel global context-aware collaborative fusion U-shaped network with progressive feature fusion of codec for medical image segmentation , author=. Neural Networks , pages=. 2025 , publisher=

  36. [36]

    Advances in neural information processing systems , volume=

    Early convolutions help transformers see better , author=. Advances in neural information processing systems , volume=

  37. [37]

    Advances in Neural Information Processing Systems , volume=

    Fast fourier convolution , author=. Advances in Neural Information Processing Systems , volume=

  38. [38]

    ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=

    Mixed transformer u-net for medical image segmentation , author=. ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=. 2022 , organization=

  39. [39]

    Neurocomputing , pages=

    A lightweight convolution and vision transformer integrated model with multi-scale self-attention mechanism , author=. Neurocomputing , pages=. 2025 , publisher=

  40. [40]

    Biomedical Signal Processing and Control , volume=

    Transformers in medical image segmentation: A review , author=. Biomedical Signal Processing and Control , volume=. 2023 , publisher=

  41. [41]

    European conference on computer vision , pages=

    Swin-unet: Unet-like pure transformer for medical image segmentation , author=. European conference on computer vision , pages=. 2022 , organization=

  42. [42]

    IEEE Transactions on Medical Imaging , volume=

    Missformer: An effective transformer for 2d medical image segmentation , author=. IEEE Transactions on Medical Imaging , volume=. 2022 , publisher=

  43. [43]

    IEEE Transactions on Instrumentation and Measurement , volume=

    Ds-transunet: Dual swin transformer u-net for medical image segmentation , author=. IEEE Transactions on Instrumentation and Measurement , volume=. 2022 , publisher=

  44. [44]

    Medical transformer: Gated axial-attention for medical image segmentation , author=. Medical image computing and computer assisted intervention--MICCAI 2021: 24th international conference, Strasbourg, France, September 27--October 1, 2021, proceedings, part I 24 , pages=. 2021 , organization=

  45. [45]

    Neural Networks , volume=

    Ct-net: Asymmetric compound branch transformer for medical image segmentation , author=. Neural Networks , volume=. 2024 , publisher=

  46. [46]

    Biomedical Signal Processing and Control , volume=

    SEAformer: Selective Edge Aggregation transformer for 2D medical image segmentation , author=. Biomedical Signal Processing and Control , volume=. 2025 , publisher=

  47. [47]

    Biomedical Signal Processing and Control , volume=

    AgileFormer: Spatially agile and scalable transformer for medical image segmentation , author=. Biomedical Signal Processing and Control , volume=. 2026 , publisher=

  48. [48]

    2025 IEEE/CVF winter conference on applications of computer vision (WACV) , pages=

    Spectformer: Frequency and attention is what you need in a vision transformer , author=. 2025 IEEE/CVF winter conference on applications of computer vision (WACV) , pages=. 2025 , organization=

  49. [49]

    , author=

    FreqFormer: Frequency-aware Transformer for Lightweight Image Super-resolution. , author=. IJCAI , pages=

  50. [50]

    Pattern Recognition , volume=

    Efficient frequency feature aggregation transformer for image super-resolution , author=. Pattern Recognition , volume=. 2025 , publisher=

  51. [51]

    Information Fusion , volume=

    Holistic dynamic frequency transformer for image fusion and exposure correction , author=. Information Fusion , volume=. 2024 , publisher=

  52. [52]

    Proceedings of the 32nd ACM International Conference on Multimedia , pages=

    Loformer: Local frequency transformer for image deblurring , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

  53. [53]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Efficient frequency domain-based transformers for high-quality image deblurring , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  54. [54]

    Information Fusion , volume=

    DBFFT: Adversarial-robust dual-branch frequency domain feature fusion in vision transformers , author=. Information Fusion , volume=. 2024 , publisher=

  55. [55]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Wnet: Audio-guided video object segmentation via wavelet-based cross-modal denoising networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  56. [56]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Xnet: Wavelet-based low and high frequency fusion networks for fully-and semi-supervised semantic segmentation of biomedical images , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  57. [57]

    Knowledge-Based Systems , volume=

    WCMamba: Enhancing high-resolution remote sensing image semantic segmentation with pyramid wavelet convolution and SS2D , author=. Knowledge-Based Systems , volume=. 2025 , publisher=

  58. [58]

    IEEE Transactions on Industrial Informatics , year=

    WTCLIP: A Wavelet-Aware CLIP Framework for Boundary-Refined Weakly Supervised Semantic Segmentation , author=. IEEE Transactions on Industrial Informatics , year=

  59. [59]

    Information Fusion , pages=

    Frequency Domain-Enhanced Spectral-Spatial Fusion Transformer for Semantic Segmentation of Remote Sensing Images , author=. Information Fusion , pages=. 2026 , publisher=

  60. [60]

    Advances in neural information processing systems , volume=

    Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation , author=. Advances in neural information processing systems , volume=

  61. [61]

    Nature communications , volume=

    The medical segmentation decathlon , author=. Nature communications , volume=. 2022 , publisher=

  62. [62]

    Medical Image Analysis , volume=

    Fast and low-GPU-memory abdomen CT organ segmentation: the flare challenge , author=. Medical Image Analysis , volume=. 2022 , publisher=

  63. [63]

    and Farag, Ayman and Turkbey, Evrim B

    Roth, Holger R. and Farag, Ayman and Turkbey, Evrim B. and Lu, Le and Liu, Jiamin and Summers, Ronald M. , title =. 2016 , publisher =. doi:10.7937/K9/TCIA.2016.tNB1kqBU , url =

  64. [64]

    2016 fourth international conference on 3D vision (3DV) , pages=

    V-net: Fully convolutional neural networks for volumetric medical image segmentation , author=. 2016 fourth international conference on 3D vision (3DV) , pages=. 2016 , organization=

  65. [65]

    Nature methods , volume=

    nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , author=. Nature methods , volume=. 2021 , publisher=

  66. [66]

    Attention U-Net: Learning Where to Look for the Pancreas

    Attention u-net: Learning where to look for the pancreas , author=. arXiv preprint arXiv:1804.03999 , year=

  67. [67]

    Transbts: Multimodal brain tumor segmentation using transformer , author=. Medical Image Computing and Computer Assisted Intervention--MICCAI 2021: 24th International Conference, Strasbourg, France, September 27--October 1, 2021, Proceedings, Part I 24 , pages=. 2021 , organization=

  68. [68]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    Unetr: Transformers for 3d medical image segmentation , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

  69. [69]

    International MICCAI Brainlesion Workshop , pages=

    Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images , author=. International MICCAI Brainlesion Workshop , pages=. 2021 , organization=

  70. [70]

    arXiv preprint arXiv:2209.15076 , year=

    3d ux-net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation , author=. arXiv preprint arXiv:2209.15076 , year=

  71. [71]

    International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

    Mednext: transformer-driven scaling of convnets for medical image segmentation , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=

  72. [72]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  73. [73]

    IEEE Transactions on Image Processing , year=

    nnformer: Volumetric medical image segmentation via a 3d transformer , author=. IEEE Transactions on Image Processing , year=

  74. [74]

    Medical Image Analysis , pages=

    VSmTrans: A Hybrid Paradigm Integrating Self-attention and Convolution for 3D Medical Image Segmentation , author=. Medical Image Analysis , pages=. 2024 , publisher=

  75. [75]

    Neural Networks , volume=

    MixUNETR: A U-shaped network based on W-MSA and depth-wise convolution with channel and spatial interactions for zonal prostate segmentation in MRI , author=. Neural Networks , volume=. 2025 , publisher=

  76. [76]

    Pattern Recognition , volume=

    3D medical image segmentation using parallel transformers , author=. Pattern Recognition , volume=. 2023 , publisher=

  77. [77]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    Attentional feature fusion , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=