Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition

Guanqun Zhao; Hongwen Yang; Jiaxuan Fang; Yitong Liu; Yufei Mao

arxiv: 2605.26600 · v1 · pith:R6UHXKHBnew · submitted 2026-05-26 · 💻 cs.LG · cs.AI

Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition

Guanqun Zhao , Yitong Liu , Jiaxuan Fang , Yufei Mao , Hongwen Yang This is my paper

Pith reviewed 2026-06-29 19:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords few-shot learningcontrastive learningautomatic modulation recognitionself-supervised learningvirtual adversarial augmentationsignal classificationspectral regularization

0 comments

The pith

DyCo-CL couples virtual adversarial augmentation with semantic consistency loss to improve 1-shot automatic modulation recognition by 6.27%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Dynamic-Consistency Contrastive Learning to overcome ineffective isotropic augmentations, spectral instability, and semantic drift in self-supervised learning for automatic modulation recognition. It pairs virtual adversarial augmentation with a semantic consistency loss, which the authors analyze as an implicit spectral regularizer that supports stable manifold exploration in the encoder. A signal-adaptive Swin backbone with fixed-window attention is added to constrain attention locality, and a hybrid knowledge fusion module incorporates physical priors. Experiments on RML benchmarks report the accuracy gain in 1-shot settings. A sympathetic reader would care if this reduces the labeled data needed for reliable signal classification in communications systems.

Core claim

DyCo-CL is a geometry-aware contrastive learning framework that couples Virtual Adversarial Augmentation with a semantic consistency loss; the authors' theoretical analysis states that this coupling acts as an implicit spectral regularizer for the encoder and thereby enables stable manifold exploration. The framework adds a Signal-Adaptive Swin Backbone to improve structural stability through constrained attention locality and a Hybrid Knowledge Fusion module to anchor representations with physical priors, resulting in a measured 6.27% accuracy gain over prior methods in 1-shot AMR on RML benchmarks.

What carries the argument

The coupling of Virtual Adversarial Augmentation with semantic consistency loss, which the paper states functions as an implicit spectral regularizer enabling stable manifold exploration in the encoder.

If this is right

The method produces a 6.27% accuracy improvement in 1-shot settings on RML benchmarks compared with prior approaches.
Constraining attention locality in the signal-adaptive Swin backbone increases structural stability of the learned representations.
Incorporating physical priors through the hybrid knowledge fusion module anchors the representations to domain knowledge.
The implicit spectral regularization supports more stable exploration of the data manifold during contrastive training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar regularization pairings could be tested on other few-shot time-series classification tasks where spectral stability matters.
If the physical priors in the fusion module are domain-specific, the gains may vary when the modulation schemes or channel conditions change.
The approach might lower the volume of labeled data required for practical deployment of modulation classifiers in wireless systems.

Load-bearing premise

The coupling of virtual adversarial augmentation with semantic consistency loss correctly acts as an implicit spectral regularizer that enables stable manifold exploration and directly produces the observed accuracy gains.

What would settle it

An ablation experiment in which removing the semantic consistency loss eliminates the 6.27% gain, or direct measurement showing that the encoder spectra do not stabilize as predicted by the regularizer, would falsify the central mechanism.

Figures

Figures reproduced from arXiv: 2605.26600 by Guanqun Zhao, Hongwen Yang, Jiaxuan Fang, Yitong Liu, Yufei Mao.

**Figure 1.** Figure 1: The overall architecture of DyCo-CL. It features a Signal-Adaptive Swin Backbone optimized via Dynamic-Consistency Pre-training (utilizing VAA and SC loss). The learned features are subsequently enhanced by a Hierarchical Hybrid Knowledge Fusion module that integrates spatio-temporal physical priors. Perturbed 𝑃( ⋅ȁ𝑥 ) 𝑃( ⋅ȁ𝑥𝜉 ) Gradient Calculation: 𝒈 KL Divergence 𝑟𝑎𝑑𝑣 𝑟𝑎𝑑𝑣 = 𝜖 𝒈 𝒈 𝟐 Signal 𝑥 Query encod… view at source ↗

**Figure 2.** Figure 2: The generation process of Virtual Adversarial Augmentation. We seek the optimal perturbation r ∗ within an ϵ-ball that maximizes this divergence: r ∗ = arg max ∥r∥2≤ϵ J (r). (8) As derived in Appendix A.2, the optimal perturbation r ∗ aligns with the dominant eigenvector of the Hessian matrix of J (r) at r = 0. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Architecture of the Fusion Module. 4.3.1. DEEP CONVOLUTIONAL STEM To mitigate the noise sensitivity of linear embeddings, we design a hierarchical Deep Convolutional Stem ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 3.** Figure 3: Structure of the Signal-Adaptive Swin Backbone. 4.2.3. OVERALL PRE-TRAINING OBJECTIVE The final objective combines the instance-discrimination contrastive loss (LNCE) with semantic consistency. The total loss is defined as: Ltotal = LNCE(xadv, xweak)+λsc ·LSC (x, xadv), (11) where λsc balances representation diversity and semantic fidelity. The training procedure is summarized in Algorithm 1. 4.3. Signal-… view at source ↗

**Figure 5.** Figure 5: Classification accuracy comparison with varying N on RML2016.10a. −20 −15 −10 −5 0 5 10 15 SNR [dB] 0.1 0.2 0.3 0.4 0.5 0.6 Accuracy AMC-CNN (9.09%) APFS (35.43%) CMSSAN (36.92%) EET-MoCo (30.30%) Resnet50-MoCo (24.87%) SSCL-AMC (36.47%) DyCo-CL (Ours) (43.84%) (a) N = 1 −20 −15 −10 −5 0 5 10 15 SNR [dB] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Accuracy AMC-CNN (9.09%) APFS (36.27%) CMSSAN (38.09%) EET-MoCo (36.25%) Re… view at source ↗

**Figure 6.** Figure 6: Accuracy vs. SNR comparison under low-data regimes on RML2016.10a. consistently outperforms prior methods across SNRs. On RML2016.10a, it exceeds the SOTA by 7.7% at 10 dB (N = 1), and on RML2018.01a ( [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Accuracy vs. SNR comparison on RML2018.01a (10- shot) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: t-SNE Visualization (N = 10, 10dB). Feature distributions on RML2016.10A (Left) and RML2018.01A (Right). clear inter-class margins, even with the increased complexity of the 2018 dataset. This validates that our spectral regularization effectively stabilizes the signal manifold across different data scales. 7. Conclusion In this work, we address concentration of measure, spectral instability, and semant… view at source ↗

**Figure 10.** Figure 10: Confusion Matrices across different SNRs ( [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

read the original abstract

Standard Self-Supervised Learning (SSL) for Automatic Modulation Recognition (AMR) struggles with ineffective isotropic augmentations, spectral instability, and semantic drift. To address these challenges, we propose Dynamic-Consistency Contrastive Learning (DyCo-CL), a geometry-aware framework that couples Virtual Adversarial Augmentation (VAA) with a semantic consistency loss. We provide a theoretical analysis indicating that this strategy acts as an implicit spectral regularizer for the encoder, enabling stable manifold exploration. Complementing this, our Signal-Adaptive Swin Backbone with fixed-window attention improves structural stability by constraining attention locality, while a Hybrid Knowledge Fusion module anchors representations with physical priors. Experiments on RML benchmarks show that DyCo-CL achieves a 6.27% accuracy gain in 1-shot settings over prior methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 6.27% 1-shot gain is stated without variance, repeated trials, or sampling details, so the central empirical claim cannot be evaluated yet.

read the letter

The paper introduces DyCo-CL, which pairs virtual adversarial augmentation with a semantic consistency loss inside a contrastive framework for few-shot AMR. It also uses a Signal-Adaptive Swin backbone with fixed-window attention and adds a Hybrid Knowledge Fusion module to incorporate physical priors. A theoretical note claims the combination acts as an implicit spectral regularizer.

The work does a clear job naming practical problems with off-the-shelf SSL on modulation signals, such as augmentations that ignore signal geometry and the risk of semantic drift. Adapting the Swin architecture and injecting domain knowledge are reasonable domain-specific moves.

The main weakness is the evidence. The abstract reports a 6.27% accuracy lift in 1-shot settings on RML benchmarks but supplies no baselines, ablations, standard deviations, number of episodes, or significance tests. In 1-shot regimes, results are sensitive to which support examples are chosen, so a lone delta without those checks is not reliable. The theoretical analysis is asserted but not shown, leaving nothing to inspect.

This is aimed at people working on machine learning for wireless signal classification, especially few-shot cases. A reader already familiar with contrastive methods might pick up the specific adaptations for signals and try them, but the current write-up does not yet establish a solid advance.

I would send the full paper to peer review if the experiments include proper statistical controls and the theory is actually derived; otherwise it needs revision first.

Referee Report

2 major / 2 minor

Summary. The paper proposes Dynamic-Consistency Contrastive Learning (DyCo-CL), a geometry-aware SSL framework for few-shot Automatic Modulation Recognition. It couples Virtual Adversarial Augmentation (VAA) with a semantic consistency loss (claimed to act as an implicit spectral regularizer), introduces a Signal-Adaptive Swin Backbone with fixed-window attention, and a Hybrid Knowledge Fusion module that incorporates physical priors. The central claim is a 6.27% accuracy improvement in 1-shot settings on RML benchmarks relative to prior methods.

Significance. If the empirical gains prove robust under repeated sampling and the theoretical regularization claim is independently verifiable, the approach could meaningfully advance few-shot AMR by mitigating augmentation ineffectiveness and spectral instability. No machine-checked proofs, reproducible code artifacts, or parameter-free derivations are mentioned.

major comments (2)

[Abstract / Experiments] Abstract / Experiments: The headline result of a 6.27% accuracy gain in 1-shot settings is stated without any mention of the number of independent trials, standard deviation across episodes, support-set sampling procedure, dataset splits, baseline implementations, or statistical significance testing. In 1-shot AMR, accuracy is known to be highly sensitive to the particular support set; a single delta without these controls cannot be treated as evidence that the gain is reliable.
[Theoretical Analysis] Theoretical Analysis section: The claim that coupling VAA with the semantic consistency loss 'acts as an implicit spectral regularizer enabling stable manifold exploration' is asserted without any visible derivation, eigenvalue bounds, or explicit connection to the encoder's spectrum. No equations are supplied that would allow verification that the combined objective produces the stated regularization effect rather than an ad-hoc combination.

minor comments (2)

Define all acronyms (VAA, DyCo-CL, RML) at first use and ensure consistent notation for loss terms across text and equations.
Figure captions and tables should explicitly state the number of runs and error bars if they are present in the full manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of experimental rigor and theoretical clarity. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract / Experiments: The headline result of a 6.27% accuracy gain in 1-shot settings is stated without any mention of the number of independent trials, standard deviation across episodes, support-set sampling procedure, dataset splits, baseline implementations, or statistical significance testing. In 1-shot AMR, accuracy is known to be highly sensitive to the particular support set; a single delta without these controls cannot be treated as evidence that the gain is reliable.

Authors: We agree that these details are essential for establishing reliability, particularly given the sensitivity of 1-shot AMR to support-set choice. The revised manuscript will report results averaged over 100 independent episodes with standard deviations, specify the random support-set sampling procedure (with fixed seeds for reproducibility), confirm the RML dataset splits, describe baseline re-implementations, and include statistical significance tests such as paired t-tests against the strongest baselines. revision: yes
Referee: [Theoretical Analysis] Theoretical Analysis section: The claim that coupling VAA with the semantic consistency loss 'acts as an implicit spectral regularizer enabling stable manifold exploration' is asserted without any visible derivation, eigenvalue bounds, or explicit connection to the encoder's spectrum. No equations are supplied that would allow verification that the combined objective produces the stated regularization effect rather than an ad-hoc combination.

Authors: The current Theoretical Analysis section offers a high-level indication based on the geometry-aware coupling of the losses. We acknowledge that it lacks explicit derivations and equations for independent verification. The revision will expand this section to include the combined objective function, a step-by-step derivation of the implicit spectral regularization effect, and connections to the encoder spectrum where analytically tractable. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical result with no visible derivations or self-referential reductions

full rationale

The paper reports an empirical 6.27% accuracy gain on RML benchmarks in 1-shot AMR and mentions a theoretical analysis of VAA plus semantic consistency loss as an implicit spectral regularizer, but supplies no equations, derivations, or load-bearing steps. No self-definitional claims, fitted inputs renamed as predictions, or self-citation chains appear in the provided text. The central claim is therefore an experimental outcome rather than a derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be audited. The framework itself is presented as novel but its dependence on unstated assumptions about augmentation geometry and physical priors cannot be quantified.

pith-pipeline@v0.9.1-grok · 5672 in / 1181 out tokens · 24204 ms · 2026-06-29T19:47:38.885377+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 3 canonical work pages

[1]

Feng, Y ., Duan, R., Li, S., Cheng, P., and Liu, W

doi: 10.1109/LCOMM.2022.3225566. Feng, Y ., Duan, R., Li, S., Cheng, P., and Liu, W. A dual- branch network with feature assistance for automatic modulation recognition.IEEE Signal Processing Letters, 32:701–705, 2025. doi: 10.1109/LSP.2025.3527901. Hazza, A., Shoaib, M., Alshebeili, S. A., and Fahad, A. An overview of feature-based methods for digital mo...

work page doi:10.1109/lcomm.2022.3225566 2022
[2]

Liang, X., Sang, R., Qian, Y ., Guo, Q., Li, F., and Du, L

doi: 10.1109/TVT.2024.3483204. Liang, X., Sang, R., Qian, Y ., Guo, Q., Li, F., and Du, L. Robust automatic modulation classification with fuzzy regularization. InForty-second International Con- ference on Machine Learning, 2025. URL https: //openreview.net/forum?id=DDIGCk25BO. Liu, F., Pan, J., and Zhou, R. Contrastive learning-based multimodal fusion mo...

work page doi:10.1109/tvt.2024.3483204 2024
[3]

Instability of Global Attention.For standard dot-product attention, Kim et al. (Kim et al., 2021) proved that the Jacobian Jglobal is a dense matrix, and its spectral norm scales with the sequence length: sup X σmax(Jglobal(X)) =O( √ L).(49) AsL→ ∞, the Lipschitz constant diverges, causing spectral instability

2021
[4]

The function ff ixed operates independently on each window

Stability of Fixed-Window Attention.In our backbone, X is partitioned into Nw =L/M non-overlapping windows {Wk}. The function ff ixed operates independently on each window. Consequently, the Jacobian Jf ixed is strictly block-diagonal: Jf ixed =diag(J 1, . . . ,JNw),(50) whereJ k is the local Jacobian for thek-th window
[5]

Since M is a fixed constant (e.g., M= 16) andM≪L,C M is independent ofL

Derivation of the Bound.The largest singular value of a block-diagonal matrix is the maximum of the singular values of its blocks: σmax(Jf ixed) = max k σmax(Jk).(51) Let CM = supW σmax(Jϕ(W)) be the Lipschitz constant of the local window attention. Since M is a fixed constant (e.g., M= 16) andM≪L,C M is independent ofL. Thus: Kff ixed(X) =σ max(Jf ixed)≤...

work page arXiv 2013

[1] [1]

Feng, Y ., Duan, R., Li, S., Cheng, P., and Liu, W

doi: 10.1109/LCOMM.2022.3225566. Feng, Y ., Duan, R., Li, S., Cheng, P., and Liu, W. A dual- branch network with feature assistance for automatic modulation recognition.IEEE Signal Processing Letters, 32:701–705, 2025. doi: 10.1109/LSP.2025.3527901. Hazza, A., Shoaib, M., Alshebeili, S. A., and Fahad, A. An overview of feature-based methods for digital mo...

work page doi:10.1109/lcomm.2022.3225566 2022

[2] [2]

Liang, X., Sang, R., Qian, Y ., Guo, Q., Li, F., and Du, L

doi: 10.1109/TVT.2024.3483204. Liang, X., Sang, R., Qian, Y ., Guo, Q., Li, F., and Du, L. Robust automatic modulation classification with fuzzy regularization. InForty-second International Con- ference on Machine Learning, 2025. URL https: //openreview.net/forum?id=DDIGCk25BO. Liu, F., Pan, J., and Zhou, R. Contrastive learning-based multimodal fusion mo...

work page doi:10.1109/tvt.2024.3483204 2024

[3] [3]

Instability of Global Attention.For standard dot-product attention, Kim et al. (Kim et al., 2021) proved that the Jacobian Jglobal is a dense matrix, and its spectral norm scales with the sequence length: sup X σmax(Jglobal(X)) =O( √ L).(49) AsL→ ∞, the Lipschitz constant diverges, causing spectral instability

2021

[4] [4]

The function ff ixed operates independently on each window

Stability of Fixed-Window Attention.In our backbone, X is partitioned into Nw =L/M non-overlapping windows {Wk}. The function ff ixed operates independently on each window. Consequently, the Jacobian Jf ixed is strictly block-diagonal: Jf ixed =diag(J 1, . . . ,JNw),(50) whereJ k is the local Jacobian for thek-th window

[5] [5]

Since M is a fixed constant (e.g., M= 16) andM≪L,C M is independent ofL

Derivation of the Bound.The largest singular value of a block-diagonal matrix is the maximum of the singular values of its blocks: σmax(Jf ixed) = max k σmax(Jk).(51) Let CM = supW σmax(Jϕ(W)) be the Lipschitz constant of the local window attention. Since M is a fixed constant (e.g., M= 16) andM≪L,C M is independent ofL. Thus: Kff ixed(X) =σ max(Jf ixed)≤...

work page arXiv 2013