arxiv: 2604.14958 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

Meijia Wang , Guochao Wang , Haozhen Chu , Bin Yao , Weichuan Zhang , Yuan Wang , Junpo Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-shot learningfine-grained classificationfrequency domainsubspace learningmetric learningdiscrete cosine transformimage classificationlow-frequency features

0 comments

The pith

Frequency components paired with spatial features reduce overfitting in few-shot fine-grained image classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a network that separates low-frequency global structures from images using the discrete cosine transform and low-pass filtering. These structures feed one subspace while raw spatial textures feed another, each built through truncated singular value decomposition. Projection distances from the two subspaces are combined by an adaptive gate that favors the more stable frequency view. The approach targets the problem of distinguishing visually similar subcategories when only a handful of labeled examples exist. If the dual view works as described, models would handle background noise better and maintain geometric consistency across limited samples.

Core claim

FEDSNet utilizes the Discrete Cosine Transform and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views, leveraging the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features.

What carries the argument

Dual low-rank subspaces built by truncated SVD on spatial features and on DCT-isolated low-frequency components, fused by an adaptive gating mechanism on projection distances.

If this is right

The model produces highly competitive accuracy on CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft under few-shot protocols.
Background interference is suppressed by isolating low-frequency global structures before subspace projection.
Single-view metric learning is replaced by cross-view fusion that supplies geometric constraints missing from spatial-only methods.
Computational cost remains balanced with accuracy, as shown by the complexity analysis in the experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frequency-plus-spatial split could be tested on other few-shot vision problems where texture biases dominate limited training sets.
Replacing DCT with wavelets or Fourier transforms would test whether the stability benefit is specific to cosine-based low-pass isolation.
If the frequency subspace proves robust across domains, it suggests that metric learning in general may benefit from explicit multi-domain constraints rather than deeper spatial networks alone.

Load-bearing premise

Low-frequency components isolated by DCT and low-pass filtering reliably capture stable structural information separate from high-frequency background noise.

What would settle it

An ablation study on the same four benchmarks showing no accuracy gain or a drop when the frequency subspace and low-pass filter are removed would falsify the claim that the dual-subspace fusion prevents overfitting.

Figures

Figures reproduced from arXiv: 2604.14958 by Bin Yao, Guochao Wang, Haozhen Chu, Junpo Yang, Meijia Wang, Weichuan Zhang, Yuan Wang.

**Figure 1.** Figure 1: Comparison of feature representations and subspace manifolds. Visualizations of feature attention maps (left)(generated via Grad-CAM [34]) and their corresponding geometric subspace structures (right) are presented. (a) Existing subspace methods (e.g., DSN) are highly susceptible to background interference. Due to feature entanglement, the constructed singular subspace is forced to overfit high-frequency b… view at source ↗

**Figure 2.** Figure 2: The overall architecture of the proposed FEDSNet framework. Input images are first encoded by a shared backbone (e.g., ResNet-12) to extract spatial feature maps. Subsequently, the pipeline diverges into two complementary branches: (1) Frequency-aware Decomposition Module: In the frequency stream, robust global structural features are explicitly decoupled from high-frequency noise via the Discrete Cosine T… view at source ↗

**Figure 3.** Figure 3: Geometric interpretation of subspace creation and projection metric. (a) Subspace Construction: Given a 5-shot support set (green spheres) containing instance-specific noise, Truncated SVD constructs a low-rank class subspace P (blue plane). This subspace captures the principal data manifold spanned by the basis vectors U, filtering out-of-plane noise. (b) Projection Metric: To classify a query sample q (r… view at source ↗

read the original abstract

Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FEDSNet adds DCT low-pass filtering plus dual SVD subspaces with adaptive gating to metric learning for few-shot FGVC, but the low-frequency isolation may strip out the high-frequency textures that actually separate subcategories.

read the letter

FEDSNet extracts spatial features from images, applies DCT with low-pass filtering to pull out global structure, builds separate low-rank subspaces for the spatial and frequency parts using truncated SVD, and fuses their projection distances with an adaptive gate. The goal is to give the spatial view some stability against noise when only a few shots are available per class. This specific combination of frequency isolation, dual subspaces, and gating is new in the few-shot FGVC setting, even though each piece draws from familiar tools. The paper lays out the single-view overfitting problem clearly and includes a complexity analysis that shows the overhead stays reasonable. Experiments are run on the standard four benchmarks and described as competitive with prior metric-learning methods. The main soft spot is the frequency choice itself. Fine-grained distinctions on CUB, Cars, Dogs, and Aircraft often hinge on local high-frequency details such as feather edges or grille patterns. Low-pass filtering removes those along with background noise, so the frequency subspace may end up encoding mostly coarse shared shape that adds little new signal. If that happens, the gating mechanism has little to work with and the claimed robustness gain shrinks. The abstract gives no numbers, no ablation on filter cutoff, and no variance across runs, which leaves the size of any improvement unclear. The math is straightforward linear algebra on top of CNN features and looks solid on its own terms. This work is aimed at researchers already building few-shot systems for domain-specific visual tasks who might want to test a frequency regularizer. A reader looking for a concrete architecture to implement or extend could get practical value from the design. The paper is coherent enough, with testable claims on public data, to deserve peer review. Referees can check the actual margins and probe whether the low-frequency view contributes or just adds cost. I would send it out.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet) for few-shot fine-grained image classification. It applies DCT with low-pass filtering to isolate low-frequency global structural components from spatial features, suppressing background noise. Truncated SVD constructs independent low-rank subspaces for spatial texture and frequency structural features, which are then fused via an adaptive gating mechanism on projection distances. This dual-view approach is claimed to leverage frequency-subspace stability to prevent spatial-subspace overfitting under few-shot constraints. Experiments on CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft are said to yield highly competitive results versus existing metric-learning methods, with favorable accuracy-efficiency trade-offs.

Significance. If the empirical claims hold with proper validation, the work introduces a concrete dual-subspace fusion strategy that explicitly incorporates frequency-domain stability to mitigate texture bias and noise overfitting in few-shot fine-grained recognition. This could provide a useful architectural paradigm for metric-learning methods in domains where background interference is common, particularly if the frequency view demonstrably supplies complementary structural cues without discarding discriminative high-frequency content.

major comments (3)

[Abstract] Abstract: The central claim of 'excellent classification performance and robustness' and 'highly competitive results' is unsupported by any numerical accuracies, standard deviations, ablation tables, or baseline comparisons in the abstract; without these in the experiments section, the performance assertions cannot be evaluated.
[Method] Method (frequency isolation via DCT/low-pass): The design assumes low-frequency components reliably encode stable structural information that prevents spatial overfitting, yet fine-grained distinctions (e.g., feather patterns on CUB-200-2011 or car details on Stanford Cars) often reside in high-frequency textures; explicit low-pass filtering risks discarding these cues, and no analysis or counter-example is provided showing the retained low-rank SVD subspace remains sufficiently discriminative.
[Experiments] Experiments: The robustness claim rests on the adaptive gating fusion stabilizing performance, but without reported ablations (e.g., spatial-only vs. dual-subspace, low-pass vs. high-pass variants, or gating ablations) or error bars across the four datasets, it is impossible to confirm that the frequency subspace supplies the claimed stabilizing effect rather than shared global shape.

minor comments (2)

[Abstract] Abstract: Vague phrasing such as 'excellent classification performance' and 'favorable balance' should be replaced with concrete metrics (e.g., accuracy percentages and FLOPs) once the full results are presented.
[Method] Notation: The description of 'projection distances from these dual views' is underspecified; clarify whether these are Euclidean distances in the respective subspaces or another metric, and how the gating weights are computed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the presentation of our results and methodological justifications. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of 'excellent classification performance and robustness' and 'highly competitive results' is unsupported by any numerical accuracies, standard deviations, ablation tables, or baseline comparisons in the abstract; without these in the experiments section, the performance assertions cannot be evaluated.

Authors: We agree that the abstract would be more informative with explicit quantitative support for the performance claims. In the revised manuscript, we will update the abstract to include key accuracy results on the four benchmark datasets (CUB-200-2011, Stanford Cars, Stanford Dogs, FGVC-Aircraft) and brief comparisons to state-of-the-art metric-learning baselines. The full tables with standard deviations and ablations remain detailed in the experiments section. revision: yes
Referee: [Method] Method (frequency isolation via DCT/low-pass): The design assumes low-frequency components reliably encode stable structural information that prevents spatial overfitting, yet fine-grained distinctions (e.g., feather patterns on CUB-200-2011 or car details on Stanford Cars) often reside in high-frequency textures; explicit low-pass filtering risks discarding these cues, and no analysis or counter-example is provided showing the retained low-rank SVD subspace remains sufficiently discriminative.

Authors: We appreciate the referee's point on potential loss of discriminative high-frequency cues. The spatial subspace is explicitly intended to retain texture details (including fine-grained patterns), while the frequency subspace supplies complementary low-frequency structural stability via low-rank SVD. The adaptive gating mechanism then balances the two views. To directly address the request for analysis, we will add a new subsection with visualizations of retained frequency components and their classification contributions in the revised manuscript. revision: yes
Referee: [Experiments] Experiments: The robustness claim rests on the adaptive gating fusion stabilizing performance, but without reported ablations (e.g., spatial-only vs. dual-subspace, low-pass vs. high-pass variants, or gating ablations) or error bars across the four datasets, it is impossible to confirm that the frequency subspace supplies the claimed stabilizing effect rather than shared global shape.

Authors: We agree that additional ablations and error bars are necessary to isolate the contribution of the frequency subspace and gating. In the revised manuscript, we will include comprehensive ablation studies (spatial-only, frequency-only, dual-subspace, low-pass vs. high-pass, and gating variants) and report mean accuracies with standard deviations over multiple runs across all four datasets to substantiate the stabilizing effect. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical architectural proposal

full rationale

The paper presents FEDSNet as a novel network architecture that applies DCT low-pass filtering to isolate low-frequency structural components, constructs dual low-rank subspaces via truncated SVD, and fuses projection distances with an adaptive gating mechanism. These elements are introduced as design choices to mitigate spatial overfitting in few-shot fine-grained classification, with claims supported by experimental results on standard benchmarks rather than any mathematical derivation. No equations, predictions, or uniqueness arguments reduce by construction to fitted inputs or prior self-citations; the derivation chain consists of independent architectural decisions validated externally.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard signal-processing assumptions about DCT and low-pass filtering plus new architectural choices; no explicit free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption DCT combined with low-pass filtering isolates low-frequency global structural components from spatial features while suppressing background interference
Invoked in the method description to justify the frequency subspace.

pith-pipeline@v0.9.0 · 5592 in / 1201 out tokens · 33967 ms · 2026-05-10T11:11:43.187640+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Deep residual learning for image recognition

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV , USA, 27–30 June 2016; pp. 770–778

2016
[2]

Few-Shot Fine-Grained Image Classification: A Comprehensive Review.Acta Autom

Ren, J.; Li, C.; An, Y.; Zhang, W.; Sun, C. Few-Shot Fine-Grained Image Classification: A Comprehensive Review.Acta Autom. Sin.2024,50, 26–44

2024
[3]

Matching Networks for One Shot Learning

Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016; Volume 29

2016
[4]

Prototypical Networks for Few-shot Learning

Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30

2017
[5]

Learning to compare: Relation network for few-shot learning

Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P .H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208

2018
[6]

Revisiting local descriptor based image-to-class measure for few-shot learning

Li, W.; Wang, L.; Xu, J.; Huo, J.; Gao, Y.; Luo, J. Revisiting local descriptor based image-to-class measure for few-shot learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7260–7268

2019
[7]

Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification.IEEE Trans

Huang, H.; Zhang, J.; Zhang, J.; Xu, J.; Wu, Q. Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification.IEEE Trans. Multimed.2021,23, 1666–1680

2021
[8]

Mixture-based feature space learning for few-shot image classification

Afrasiyabi, A.; Lalonde, J.F.; Gagné, C. Mixture-based feature space learning for few-shot image classification. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9041–9051

2021
[9]

DeepEMD: Differentiable earth mover’s distance for few-shot learning

Zhang, C.; Cai, Y.; Lin, G.; Shen, C. DeepEMD: Differentiable earth mover’s distance for few-shot learning. IEEE Trans. Pattern Anal. Mach. Intell.2022,44, 6358–6375

2022
[10]

Adaptive Subspaces for Few-Shot Learning

Simon, C.; Koniusz, P .; Nock, R.; Harandi, M. Adaptive Subspaces for Few-Shot Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4136–4145. 18 of 20

2020
[11]

Learning in the frequency domain

Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.K.; Ren, F. Learning in the frequency domain. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1740–1749

2020
[12]

Global filter networks for image classification

Rao, Y.; Zhao, W.; Zhu, Z.; Lu, J.; Zhou, J. Global filter networks for image classification. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–14 December 2021; Volume 34, pp. 980–993

2021
[13]

Thinking in frequency: Face forgery detection by mining frequency-aware clues

Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8603–8612

2020
[14]

Robust recovery of subspace structures by low-rank representa- tion.IEEE Trans

Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representa- tion.IEEE Trans. Pattern Anal. Mach. Intell.2013,35, 171–184

2013
[15]

Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Zhu, Y.; Liu, C.; Jiang, S. Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Yokohama, Japan, 11–17 July 2020; pp. 1090–1096

2020
[16]

Dual Attention Networks for Few-Shot Fine-Grained Recognition

Xu, S.; Zhang, F.; Wei, X.; Wang, J. Dual Attention Networks for Few-Shot Fine-Grained Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 2970–2978

2022
[17]

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

Wu, Y.; Zhang, B.; Yu, G.; Zhang, W.; Wang, B.; Chen, T.; Fan, J. Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification. InProceedings of the ACM International Conference on Multimedia (ACM MM), Chengdu, China, 20–24 October 2021; pp. 3172–3180

2021
[18]

Generalizing from a few examples: A survey on few-shot learning

Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv.2020,53, 1–34

2020
[19]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Finn, C.; Abbeel, P .; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 1126–1135

2017
[20]

Meta-sgd: Learning to learn quickly for few-shot learning.arXiv preprint arXiv:1707.09835,

Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.arXiv2017, arXiv:1707.09835

work page arXiv
[21]

Task agnostic meta-learning for few-shot learning

Jamal, M.A.; Qi, G.J. Task agnostic meta-learning for few-shot learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11719–11727

2019
[22]

Siamese neural networks for one-shot image recognition

Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. InProceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2

2015
[23]

Deep metric learning using triplet network

Hoffer, E.; Ailon, N. Deep metric learning using triplet network. InProceedings of the International Workshop on Similarity-Based Pattern Recognition, Copenhagen, Denmark, 12–14 October 2015; pp. 84–92

2015
[24]

Few-Shot Classification With Feature Map Reconstruction Networks

Wertheimer, D.; Tang, L.; Hariharan, B. Few-Shot Classification With Feature Map Reconstruction Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8012–8021

2021
[25]

Bi-directional feature reconstruction network for fine-grained few-shot image classification

Wu, J.; Chang, D.; Sain, A.; Li, X.; Ma, Z.; Cao, J.; Jun, G.; Song, Y.Z. Bi-directional feature reconstruction network for fine-grained few-shot image classification. InProceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 2821–2829

2023
[26]

Wah, C.; Branson, S.; Welinder, P .; Perona, P .; Belongie, S.The Caltech-UCSD Birds-200-2011 Dataset; Technical Report CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011

2011
[27]

3D Object Representations for Fine-Grained Categorization

Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D Object Representations for Fine-Grained Categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Sydney, Australia, 2–8 December 2013; pp. 554–561

2013
[28]

Fine-Grained Visual Classification of Aircraft

Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-Grained Visual Classification of Aircraft.arXiv 2013, arXiv:1306.5151

work page internal anchor Pith review arXiv 2013
[29]

BSNet: Bi-similarity network for few-shot fine-grained image classification.IEEE Trans

Li, X.; Wu, J.; Sun, Z.; Ma, Z.; Cao, J.; Xue, J.H. BSNet: Bi-similarity network for few-shot fine-grained image classification.IEEE Trans. Image Process.2020,30, 1318–1331

2020
[30]

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Ma, Z.X.; Chen, Z.D.; Zhao, L.J. Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification. InProceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 4136–4144

2024
[31]

Boosting few-shot fine-grained recognition with background suppression and foreground alignment.IEEE Trans

Zha, Z.; Tang, H.; Sun, Y. Boosting few-shot fine-grained recognition with background suppression and foreground alignment.IEEE Trans. Circuits Syst. Video Technol.2023,33, 3947–3961. 19 of 20

2023
[32]

Crosstransformers: spatially-aware few-shot transfer

Doersch, C.; Gupta, A.; Zisserman, A. Crosstransformers: spatially-aware few-shot transfer. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; Volume 33, pp. 21981–21993

2020
[33]

Global and local feature interaction with vision transformer for few-shot image classification

Sun, M.; Ma, W.; Liu, Y. Global and local feature interaction with vision transformer for few-shot image classification. InProceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), Atlanta, GA, USA, 17–21 October 2022; pp. 4530–4534

2022
[34]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626

2017
[35]

A Tutorial on Principal Component Analysis

Shlens, J. A tutorial on principal component analysis.arXiv2014, arXiv:1404.1100

work page Pith review arXiv
[36]

Imagenet classification with deep convolutional neural networks

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV , USA, 3–6 December 2012; Volume 25

2012
[37]

Meta-Learning with Latent Embedding Optimization

Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-Learning with Latent Embedding Optimization. InProceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019

2019
[38]

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Raghu, A.; Raghu, M.; Bengio, S.; Vinyals, O. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. InProceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020

2020
[39]

Low-shot visual recognition by shrinking and hallucinating features

Hariharan, B.; Girshick, R. Low-shot visual recognition by shrinking and hallucinating features. InProceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3015–3024

2017
[40]

Delta-encoder: an effective sample synthesis method for few-shot object recognition

Schwartz, E.; Karlinsky, L.; Shtok, J.; Harary, S.; Marder, M.; Kumar, A.; Bronstein, A. Delta-encoder: an effective sample synthesis method for few-shot object recognition. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 3–8 December 2018; Volume 31

2018
[41]

Asymmetric Distribution Measure for Few-shot Learning

Li, W.; Wang, L.; Huo, J.; Shi, Y.; Gao, Y.; Luo, J. Asymmetric Distribution Measure for Few-shot Learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Virtual, 7–15 January 2020; pp. 3081–3087

2020
[42]

Squeeze-and-excitation networks

Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141

2018
[43]

Cbam: Convolutional block attention module

Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. InProceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19

2018
[44]

Context-Aware Task-Specific Metric Learning for Few-Shot Classification.IEEE Trans

Li, Q.; Cai, W.; Wang, Y.; Zhou, H.; Predovic, G.; Feng, D.D. Context-Aware Task-Specific Metric Learning for Few-Shot Classification.IEEE Trans. Circuits Syst. Video Technol.2021,31, 736–750

2021
[45]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InProceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021

2021
[46]

Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning

Hu, S.; Xie, Z.; Liu, H.; Nie, J.; He, Z.; Liu, Y. Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 9068–9077

2022
[47]

SVDNet for Pedestrian Retrieval

Sun, Y.; Zheng, L.; Deng, W.; Wang, S. SVDNet for Pedestrian Retrieval. InProceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3800–3808

2017
[48]

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Geirhos, R.; Rubisch, P .; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. InProceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019

2019
[49]

FcaNet: Frequency Channel Attention Networks

Qin, Z.; Zhang, P .; Wu, F.; Li, X. FcaNet: Frequency Channel Attention Networks. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 783–792

2021
[50]

Meta-Learning with Differentiable Convex Optimization

Lee, K.; Maji, S.; Ravichandran, A.; Soatto, S. Meta-Learning with Differentiable Convex Optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10657–10665

2019
[51]

Distribution Consistency based Covariance Metric Networks for Few-Shot Learning

Li, W.; Xu, J.; Huo, J.; Wang, L.; Gao, Y.; Luo, J. Distribution Consistency based Covariance Metric Networks for Few-Shot Learning. InProceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8642–8649. 20 of 20

2019
[52]

Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification.arXiv Preprint2022, arXiv:2207.00784

Zhang, B.; Yuan, J.; Li, B.; Chen, T.; Fan, J.; Shi, B. Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification.arXiv Preprint2022, arXiv:2207.00784

work page arXiv
[53]

TOAN: Target-oriented alignment network for fine-grained image categorization with few labeled samples.IEEE Trans

Huang, H.; Zhang, J.; Yu, L.; Zhang, J.; Wu, Q.; Xu, C. TOAN: Target-oriented alignment network for fine-grained image categorization with few labeled samples.IEEE Trans. Circuits Syst. Video Technol.2022,32, 853–866

2022
[54]

Local descriptor-based multi-prototype network for few-shot learning.Pattern Recognition2021,116, 107935

Huang, H.; Wu, Z.; Li, W.; Huo, J.; Gao, Y. Local descriptor-based multi-prototype network for few-shot learning.Pattern Recognition2021,116, 107935
[55]

Recent advances on image edge detection: A comprehensive review.Neurocom- puting,2022,503, 259-271

Jing J, Liu S, Wang G, et al. Recent advances on image edge detection: A comprehensive review.Neurocom- puting,2022,503, 259-271

2022
[56]

Corner detection and classification using anisotropic directional derivative representations.IEEE Transactions on Image Processing,2013,501, 3204-3218

Shui, Peng-Lang, and Wei-Chuan Zhang. Corner detection and classification using anisotropic directional derivative representations.IEEE Transactions on Image Processing,2013,501, 3204-3218

2013
[57]

Noise robust image edge detection based upon the automatic anisotropic Gaussian kernels.Pattern Recognition,2017,63, 193-205

Zhang W C, Zhao Y L, Breckon T P , et al. Noise robust image edge detection based upon the automatic anisotropic Gaussian kernels.Pattern Recognition,2017,63, 193-205

2017
[58]

Contour-based corner detection via angle difference of principal directions of anisotropic Gaussian directional derivatives.Pattern Recognition,2015,48, 2785-2797

Zhang W C, Shui P L. Contour-based corner detection via angle difference of principal directions of anisotropic Gaussian directional derivatives.Pattern Recognition,2015,48, 2785-2797

2015
[59]

Corner detection using multi-directional structure tensor with multiple scales.International Journal of Computer Vision,2020,128, 438-459

Zhang W, Sun C. Corner detection using multi-directional structure tensor with multiple scales.International Journal of Computer Vision,2020,128, 438-459

2020
[60]

Discrete curvature representations for noise robust image corner detection

Zhang W, Sun C, Breckon T, et al. Discrete curvature representations for noise robust image corner detection. IEEE Transactions on Image Processing,2019,28, 4444-4459

2019
[61]

Image feature information extraction for interest point detection: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45, 4694-4712

Jing J, Gao T, Zhang W, et al. Image feature information extraction for interest point detection: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45, 4694-4712

2022
[62]

Corner detection using second-order generalized Gaussian directional derivative represen- tations.IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43, 1213-1224

Zhang W, Sun C. Corner detection using second-order generalized Gaussian directional derivative represen- tations.IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43, 1213-1224

2019
[63]

Re-abstraction and perturbing support pair network for few-shot fine-grained image classification.Pattern Recognition,2024,148, 110158

Zhang W, Zhao Y, Gao Y, et al. Re-abstraction and perturbing support pair network for few-shot fine-grained image classification.Pattern Recognition,2024,148, 110158

2024
[64]

Image intensity variation information for interest point detection.IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45, 9883-9894

Zhang W, Sun C, Gao Y. Image intensity variation information for interest point detection.IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45, 9883-9894

2023
[65]

Lei T, Song W, Zhang W, et al. Semi-supervised 3-d medical image segmentation using multiconsistency learning with fuzzy perception-guided target selection.IEEE Transactions on Radiation and Plasma Medical Sciences,2024,9, 421-432

2024
[66]

Adaptive feature selection-based feature reconstruction network for few-shot learning.Pattern Recognition,2025, 112289

Ren J, An Y, Lei T, et al. Adaptive feature selection-based feature reconstruction network for few-shot learning.Pattern Recognition,2025, 112289

2025
[67]

A principal component analysis-based feature optimization network for few-shot fine-grained image classification[J].Mathematics,2025,13(7): 1098

Wang M, Zheng B, Wang G, et al. A principal component analysis-based feature optimization network for few-shot fine-grained image classification[J].Mathematics,2025,13(7): 1098

2025
[68]

Feature complementation architecture for visual place recognition[J].arXiv preprint, arXiv:2506.12401,2025

Wang W, Wang M, Wang H, et al. Feature complementation architecture for visual place recognition[J].arXiv preprint, arXiv:2506.12401,2025

work page arXiv 2025
[69]

An unbiased feature estimation network for few-shot fine-grained image classification[J].Sensors,2024,24(23), 7737

Wang J, Lu J, Yang J, et al. An unbiased feature estimation network for few-shot fine-grained image classification[J].Sensors,2024,24(23), 7737

2024
[70]

Meningioma Analysis and Diagnosis using Limited Labeled Samples[J].arXiv preprint, arXiv:2602.13335,2026

Lu J, Wu W, Gao K, et al. Meningioma Analysis and Diagnosis using Limited Labeled Samples[J].arXiv preprint, arXiv:2602.13335,2026

work page arXiv 2026
[71]

Deep learning-based neurodevelopmental assessment in preterm infants[J]

Ren L, Lu J, Zhang W, et al. Deep learning-based neurodevelopmental assessment in preterm infants[J]. arXiv preprint, arXiv:2601.11944,2026

work page arXiv 2026