Quaternion Self-Attention with Shared Scores

Hideaki Tamori; Shogo Yamauchi; Tohru Nitta

arxiv: 2605.24920 · v1 · pith:CFS7KDB3new · submitted 2026-05-24 · 💻 cs.LG · cs.AI· stat.ML

Quaternion Self-Attention with Shared Scores

Shogo Yamauchi , Tohru Nitta , Hideaki Tamori This is my paper

Pith reviewed 2026-06-30 12:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords quaternion self-attentionshared scorescomponent pre-mixingquaternion inner productspeech enhancementattention efficiencyparameter efficiency

0 comments

The pith

When quaternion linear projections pre-mix components, shared attention scores span the same interaction subspace as independent component-wise scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a shared-score quaternion self-attention that computes one real-valued score from the quaternion inner product and applies the same distribution to all components. This reduces score multiplications by 75% and softmax operations from four to one. The authors prove that when queries and keys come from quaternion linear projections inducing component pre-mixing, the shared scores and component-wise scores lie in the same interaction subspace. This means independent component-wise attention mainly re-parameterizes the same interactions rather than expanding the feature interaction space. Experiments show reduced inference time in speech enhancement while keeping quality.

Core claim

When queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space.

What carries the argument

The shared-score mechanism that computes a single real-valued score using the quaternion inner product and shares the resulting attention distribution across components.

If this is right

Score computation multiplications are reduced by 75%.
Softmax operations drop from four to one.
Inference time is reduced by up to 44.3% on GPU and 58.1% on CPU in speech enhancement without quality loss.
Similar efficiency gains appear in vision and natural language processing tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners can adopt the shared-score version for efficiency gains when the pre-mixing condition holds without loss of interaction capacity.
The pre-mixing property of quaternion projections could be checked in other models to confirm the equivalence.
Shared scoring ideas may apply to other multi-dimensional neural network components.
These reductions could support scaling quaternion models to larger sizes or lower-resource hardware.

Load-bearing premise

The quaternion linear projections for queries and keys induce component pre-mixing.

What would settle it

Finding quaternion linear projections that do not induce component pre-mixing yet produce different spanned subspaces for shared versus component-wise scores.

Figures

Figures reproduced from arXiv: 2605.24920 by Hideaki Tamori, Shogo Yamauchi, Tohru Nitta.

**Figure 1.** Figure 1: Comparison of quaternion self-attention architectures. Left: (Tay et al., 2019) computed the full Hamilton product using four independent softmax operations, which led to high computational cost and produced component-wise attention distributions. Right (Ours): Our method derives a single shared score via the quaternion inner product, reducing multiplications by 75% while preserving inter-component relati… view at source ↗

**Figure 2.** Figure 2: Visualization of the first layer in the QTransformer bottleneck. The attention mechanism of Tay et al. (2019) attends to different positions for each component. performance while substantially reducing the computational overhead and demonstrating superior parameter efficiency compared to real-valued models. These results indicate that, under quaternion linear projections, the additional expressiveness of … view at source ↗

**Figure 3.** Figure 3: From left to right: Visualization of the attention output (flattened Oours values) distribution of the proposed method, the output distribution of Tay et al. (2019) (flattened OTay values), and the quantile correlation between the two distributions. The results demonstrate a high degree of similarity, with a Kolmogorov–Smirnov (KS) statistic of 0.0128, a Wasserstein distance of 0.028, and an extremely high… view at source ↗

**Figure 4.** Figure 4: Correlation analysis of attention outputs. (a) Outputs from independently trained models on the same input show near-zero correlation (corr = 0.028). (b,c) Applying both formulas to identical (Q, K, V ) yields high correlation (corr = 0.78–0.88), indicating similar effective capacity despite different algebraic structures. Domain Metric Quality Efficiency Gain Output Similarity Tay et al. Ours KS Stat. Was… view at source ↗

**Figure 5.** Figure 5: Log-magnitude spectrogram comparison on VoiceBank+DEMAND (a representative test utterance; all panels share the same color scale). The real-valued Conformer leaves residual low-frequency noise, whereas QTN (Yang et al., 2023) tends to over-suppress low-frequency components. Both quaternion attention methods (ours and that of Tay et al. (2019)) are visually close to the clean reference. Bottleneck. The bott… view at source ↗

**Figure 6.** Figure 6: Overview of the proposed quaternion speech enhancement model. (a) Overall encoder–bottleneck–decoder architecture. (b) Quaternion Transformer/Conformer block with the proposed shared-score quaternion self-attention. (c) QDilated DenseNet module with gated activation. RMS Loss. The root mean square error between the estimated and target waveforms encourages the matching of the energy levels: LRMS = vuut 1 N… view at source ↗

read the original abstract

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shared-score quaternion attention gives a clean 75% cut in score multiplications plus a subspace proof that holds only when the linear projections pre-mix components.

read the letter

The new piece is the shared-score construction: one real-valued attention score from the quaternion inner product, then the same distribution applied to all four components. That drops the softmax count from four to one and the score multiplications by 75%. The paper also supplies an explicit proof that, when queries and keys come from quaternion linear layers that pre-mix the components, the component-wise scores and the shared scores occupy the same interaction subspace. That is a direct algorithmic change backed by a mathematical statement rather than just another empirical tweak.

The experiments report clear inference-time wins—up to 44% on GPU and 58% on CPU for speech enhancement—while quality holds, with similar patterns shown on vision and NLP tasks. The efficiency numbers are concrete and the subspace claim is stated as conditional on the pre-mixing property, which matches the architecture they actually use.

The main soft spot is that the equivalence proof rests on the pre-mixing condition being satisfied by the chosen quaternion linear projections. The abstract flags this, but the full verification details are not visible here, so a referee would want to see exactly how the projections are constructed and whether the condition is automatic or requires extra checks. The quality-maintenance claim is also stated at a high level; more granular metric tables would strengthen it.

This is useful for anyone already working inside quaternion networks or parameter-efficient multi-dimensional models. It is not going to shift the broader attention literature, but the change is simple enough and the supporting math is explicit enough that it deserves a serious referee rather than a desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a shared-score quaternion self-attention mechanism that replaces component-wise scores and independent softmaxes with a single real-valued score computed via the quaternion inner product. This yields a 75% reduction in score-computation multiplications and reduces softmax operations from four to one. Under the stated condition that queries and keys arise from quaternion linear projections inducing component pre-mixing, the authors prove that component-wise and shared scores occupy the same interaction subspace, implying that independent component-wise attention largely re-parameterizes rather than expands the interaction space. Experiments report inference-time reductions of up to 44.3% (GPU) and 58.1% (CPU) on speech enhancement while preserving quality, with analogous trends shown for vision and NLP tasks.

Significance. If the conditional subspace result holds, the work supplies both a concrete efficiency improvement for quaternion networks and a theoretical clarification of the representational capacity of component-wise versus shared attention. The explicit proof of subspace equivalence (when the pre-mixing premise is met) and the reproducible efficiency measurements constitute clear strengths.

major comments (1)

[Abstract / proof section] Abstract and the proof section: the subspace-equivalence claim is explicitly conditional on quaternion linear projections inducing component pre-mixing, yet the manuscript provides no explicit verification procedure, numerical check, or experimental confirmation that this condition holds for the projections used in the reported models; because the claim that independent attention 'primarily re-parameterizes the same interactions' rests on this premise, its verification details are load-bearing.

minor comments (2)

[Abstract] The abstract states the 75% multiplication reduction and the four-to-one softmax reduction; these figures should be cross-referenced to the precise operation counts in the complexity analysis section for immediate verification.
[Method section] Notation for the quaternion inner product and the resulting real-valued score should be introduced with a single displayed equation early in the method section to avoid repeated inline definitions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive comment on the conditional nature of the subspace-equivalence result. We address the concern directly below.

read point-by-point responses

Referee: [Abstract / proof section] Abstract and the proof section: the subspace-equivalence claim is explicitly conditional on quaternion linear projections inducing component pre-mixing, yet the manuscript provides no explicit verification procedure, numerical check, or experimental confirmation that this condition holds for the projections used in the reported models; because the claim that independent attention 'primarily re-parameterizes the same interactions' rests on this premise, its verification details are load-bearing.

Authors: We agree that the manuscript does not supply an explicit verification procedure or numerical check confirming that the quaternion linear projections in the reported models induce component pre-mixing. The theoretical claim is therefore presented conditionally, and the absence of such a check leaves the practical applicability of the subspace result less firmly established than it could be. In the revised version we will add a short subsection (placed after the proof) that (i) states a concrete, reproducible verification procedure based on inspecting the effective mixing induced by the quaternion weight matrices and (ii) reports a numerical check on the actual projection layers used in the speech-enhancement, vision, and NLP experiments. This addition will make the load-bearing premise verifiable without altering the existing proof or experimental results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new shared-score quaternion self-attention via the quaternion inner product (a direct algorithmic change) and states a conditional mathematical proof that, under quaternion linear projections inducing component pre-mixing, component-wise and shared scores occupy the same interaction subspace. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, ansatz smuggled via citation, or renaming of a known result. The proof is presented as an independent mathematical claim conditional on an explicitly stated premise; the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are introduced; the work operates inside the existing quaternion algebra and standard attention framework.

pith-pipeline@v0.9.1-grok · 5711 in / 1053 out tokens · 33079 ms · 2026-06-30T12:04:30.475621+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 30 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

CMGAN : Conformer-based metric gan for speech enhancement

Cao, R., Abdulatif, S., and Bin, Y. CMGAN : Conformer-based metric gan for speech enhancement. pp.\ 936--940, 09 2022. doi:10.21437/Interspeech.2022-517

work page doi:10.21437/interspeech.2022-517 2022
[3]

L., Siniscalchi, S

Chao, R., Cheng, W.-H., Quatra, M. L., Siniscalchi, S. M., Yang, C.-H. H., Fu, S.-W., and Tsao, Y. An investigation of incorporating mamba for speech enhancement. In 2024 IEEE Spoken Language Technology Workshop (SLT), pp.\ 302--308, 2024. doi:10.1109/SLT61566.2024.10832332

work page doi:10.1109/slt61566.2024.10832332 2024
[4]

Learning to rotate: Quaternion transformer for complicated periodical time series forecasting

Chen, W., Wang, W., Peng, B., Wen, Q., Zhou, T., and Sun, L. Learning to rotate: Quaternion transformer for complicated periodical time series forecasting. KDD '22, pp.\ 146^^e2^^80^^93156, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi:10.1145/3534678.3539234

work page doi:10.1145/3534678.3539234 2022
[5]

Y., Ermon, S., Rudra, A., and R\' e , C

Dao, T., Fu, D. Y., Ermon, S., Rudra, A., and R\' e , C. Flashattention: fast and memory-efficient exact attention with io-awareness. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA, 2022. Curran Associates Inc. ISBN 9781713871088

2022
[6]

MetricGAN+ : An improved version of MetricGAN for speech enhancement

Fu, S.-W., Yu, C., Hsieh, T.-A., Plantinga, P., Ravanelli, M., Lu, X., and Tsao, Y. MetricGAN+ : An improved version of MetricGAN for speech enhancement. In Interspeech, pp.\ 201--205, 2021. doi:10.21437/Interspeech.2021-599

work page doi:10.21437/interspeech.2021-599 2021
[7]

and Maida, A

Gaudet, C. and Maida, A. Deep quaternion networks. pp.\ 1--8, 07 2018. doi:10.1109/IJCNN.2018.8489651

work page doi:10.1109/ijcnn.2018.8489651 2018
[8]

and Wang, P

Grant, B. and Wang, P. Quaternion approximation networks for enhanced image classification and oriented object detection, 2025. URL https://arxiv.org/abs/2509.05512

work page arXiv 2025
[9]

Conformer: Convolution-augmented transformer for speech recognition

Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., and Pang, R. Conformer: Convolution-augmented transformer for speech recognition. In Proc. Interspeech, pp.\ 5036--5040, 2020

2020
[10]

u rlebeck, K., Habetha, K., and Spr \

G \"u rlebeck, K., Habetha, K., and Spr \"o ig, W. Holomorphic functions in the plane and n-dimensional space. 2007. URL https://api.semanticscholar.org/CorpusID:117407172

2007
[11]

Hamilton, W. R. S. On quaternions, or on a new system of imaginaries in algebra. 1847
[12]

Hashim, H. F. B. and Ogawa, T. Estimation of forearm motion based on emg using quaternion neural network. Journal of Advanced Computational Intelligence and Intelligent Informatics, 26 0 (3): 0 269--278, 2022. doi:10.20965/jaciii.2022.p0269

work page doi:10.20965/jaciii.2022.p0269 2022
[13]

and Loizou, P

Hu, Y. and Loizou, P. C. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16 0 (1): 0 229--238, 2008. doi:10.1109/TASL.2007.911054

work page doi:10.1109/tasl.2007.911054 2008
[14]

DCCRN : Deep complex convolution recurrent network for phase-aware speech enhancement

Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., Wu, J., Zhang, B., and Xie, L. DCCRN : Deep complex convolution recurrent network for phase-aware speech enhancement. In Interspeech, pp.\ 2472--2476, 2020. doi:10.21437/Interspeech.2020-2537

work page doi:10.21437/interspeech.2020-2537 2020
[15]

Quaternion neural network and its application

Isokawa, T., Kusakabe, T., Matsui, N., and Peper, F. Quaternion neural network and its application. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp.\ 318--324. Springer, 2003

2003
[16]

Quaternionic neural networks: Fundamental properties and applications

Isokawa, T., Matsui, N., and Nishimura, H. Quaternionic neural networks: Fundamental properties and applications. In Complex-Valued Neural Networks: Utilizing High-Dimensional Parameters, pp.\ 411--439. 2009

2009
[17]

and Seo, H

Kim, E. and Seo, H. Se-conformer: Time-domain speech enhancement using conformer. pp.\ 2736--2740, 08 2021. doi:10.21437/Interspeech.2021-2207

work page doi:10.21437/interspeech.2021-2207 2021
[18]

End-to-end multi-task denoising for joint sdr and pesq optimization

Kim, J., El-Khamy, M., and Lee, J. End-to-end multi-task denoising for joint sdr and pesq optimization. ArXiv, abs/1901.09146, 2019. URL https://api.semanticscholar.org/CorpusID:59316572

work page arXiv 1901
[19]

Learning multiple layers of features from tiny images

Krizhevsky, A. Learning multiple layers of features from tiny images. pp.\ 32--33, 2009. URL https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf

2009
[20]

Le Roux, J., Wisdom, S., Erdogan, H., and Hershey, J. R. SDR -- Half-Baked or Well Done? In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.\ 626--630, 2019. doi:10.1109/ICASSP.2019.8683855

work page doi:10.1109/icassp.2019.8683855 2019
[21]

A si-sdr loss function based monaural source separation

Li, S., Liu, H., Zhou, Y., and Luo, Z. A si-sdr loss function based monaural source separation. In 2020 15th IEEE International Conference on Signal Processing (ICSP), volume 1, pp.\ 356--360, 2020. doi:10.1109/ICSP48669.2020.9321080

work page doi:10.1109/icsp48669.2020.9321080 2020
[22]

and Hutter, F

Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7

2019
[23]

B., Tiwari, N., and Mishra, S

Mukhopadhyay, A., Joshi, R. B., Tiwari, N., and Mishra, S. Transformers at a fraction. In Northern Lights Deep Learning Conference 2025, 2024. URL https://openreview.net/forum?id=1U0kkt7ymn

2025
[24]

and Radfar, M

Muppidi, A. and Radfar, M. Speech emotion recognition using quaternion convolutional neural networks. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6309--6313, 2021. doi:10.1109/ICASSP39728.2021.9414248

work page doi:10.1109/icassp39728.2021.9414248 2021
[25]

A quaternary version of the back-propagation algorithm

Nitta, T. A quaternary version of the back-propagation algorithm. In Proceedings of ICNN'95 - International Conference on Neural Networks, volume 5, pp.\ 2753--2756, 1995. doi:10.1109/ICNN.1995.488166

work page doi:10.1109/icnn.1995.488166 1995
[26]

Quaternion convolutional neural networks for end-to-end automatic speech recognition

Parcollet, T., Zhang, Y., Morchid, M., Trabelsi, C., Linar^^c3^^a8s, G., De Mori, R., and Bengio, Y. Quaternion convolutional neural networks for end-to-end automatic speech recognition. 06 2018. doi:10.21437/Interspeech.2018-1898

work page doi:10.21437/interspeech.2018-1898 2018
[27]

Segan: Speech enhancement generative adversarial network

Pascual, S., Bonafonte, A., and Serr^^c3^^a0, J. Segan: Speech enhancement generative adversarial network. pp.\ 3642--3646, 08 2017. doi:10.21437/Interspeech.2017-1428

work page doi:10.21437/interspeech.2017-1428 2017
[28]

Reddy, C. K. A., Dubey, H., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., and Srinivasan, S. Icassp 2021 deep noise suppression challenge. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6623--6627, 2021 a . doi:10.1109/ICASSP39728.2021.9415105

work page doi:10.1109/icassp39728.2021.9415105 2021
[29]

Reddy, C. K. A., Gopal, V., and Cutler, R. Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6493--6497, 2021 b . doi:10.1109/ICASSP39728.2021.9414878

work page doi:10.1109/icassp39728.2021.9414878 2021
[30]

and Beerends, J.G

Rix, A. W., Beerends, J. G., Hollier, M. P., and Hekstra, A. P. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pp.\ 749--752, 2001. doi:10.1109/ICASSP.2001.941023

work page doi:10.1109/icassp.2001.941023 2001
[31]

S., Kartiwi, M., Nugroho, B

Saleem, N., Gunawan, T. S., Kartiwi, M., Nugroho, B. S., and Wijayanto, I. Nse-catnet: Deep neural speech enhancement using convolutional attention transformer network. IEEE Access, 11: 0 66979--66994, 2023. doi:10.1109/ACCESS.2023.3290908

work page doi:10.1109/access.2023.3290908 2023
[32]

Universal Score-based Speech Enhancement with High Content Preservation

Scheibler, R., Fujita, Y., Shirahata, Y., and Komatsu, T. Universal Score-based Speech Enhancement with High Content Preservation . In Interspeech 2024 , pp.\ 1165--1169, 2024. doi:10.21437/Interspeech.2024-138

work page doi:10.21437/interspeech.2024-138 2024
[33]

N., Rosenkranz, T., and Maier, A

Schr \"o ter, H., Escalante-B., A. N., Rosenkranz, T., and Maier, A. DeepFilterNet : Perceptually motivated real-time speech enhancement. In Interspeech 2023, pp.\ 2008--2009, 2023. URL https://www.isca-archive.org/interspeech_2023/schroter23b_interspeech.html

2023
[34]

D., Ng, A., and Potts, C

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.\ 1631--1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. UR...

2013
[35]

H., Hendriks, R

Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.\ 4214--4217, 2010. doi:10.1109/ICASSP.2010.5495701

work page doi:10.1109/icassp.2010.5495701 2010
[36]

T., Rao, J., Zhang, S., Wang, S., Fu, J., and Hui, S

Tay, Y., Zhang, A., Luu, A. T., Rao, J., Zhang, S., Wang, S., Fu, J., and Hui, S. C. Lightweight and efficient neural natural language processing with quaternion networks. In Korhonen, A., Traum, D., and M \`a rquez, L. (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.\ 1494--1503, Florence, Italy, July 2...

work page doi:10.18653/v1/p19-1145 2019
[37]

The diverse environments multi-channel acoustic noise database ( DEMAND ): A database of multichannel environmental noise recordings

Thiemann, J., Ito, N., and Vincent, E. The diverse environments multi-channel acoustic noise database ( DEMAND ): A database of multichannel environmental noise recordings. In Proceedings of Meetings on Acoustics, volume 19, pp.\ 035081. Acoustical Society of America, 2013

2013
[38]

Investigating rnn-based speech enhancement methods for noise-robust text-to-speech

Valentini-Botinhao, C., Wang, X., Takaki, S., and Yamagishi, J. Investigating rnn-based speech enhancement methods for noise-robust text-to-speech. In 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), pp.\ 146--152, 2016. doi:10.21437/SSW.2016-24

work page doi:10.21437/ssw.2016-24 2016
[39]

Wavenet: A generative model for raw audio

van den Oord , A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. In 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), pp.\ 125, 2016

2016
[40]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017

2017
[41]

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database

Veaux, C., Yamagishi, J., and King, S. The voice bank corpus: Design, collection and data analysis of a large regional accent speech database. In Proc. Int. Conf. Oriental COCOSDA, November 2013

2013
[42]

TSTNN : Two-stage transformer based neural network for speech enhancement in the time domain

Wang, K., He, B., and Zhu, W.-P. TSTNN : Two-stage transformer based neural network for speech enhancement in the time domain. In ICASSP, pp.\ 7098--7102, 2021

2021
[43]

Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Yamamoto, R., Song, E., and Kim, J.-M. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6199--6203, 2020. doi:10.1109/ICASSP40776.2020.9053795

work page doi:10.1109/icassp40776.2020.9053795 2020
[44]

Improving the Generation of

Yamauchi, S., Nitta, T., and Ohnishi, T. Learning characteristics of reverse quaternion neural network. In 2025 International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8, 2025. doi:10.1109/IJCNN64981.2025.11228907

work page doi:10.1109/ijcnn64981.2025.11228907 2025
[45]

PDMX: A large-scale public domain MusicXML dataset for symbolic music processing

Yan, H., Zhang, J., Fan, C., Zhou, Y., and Liu, P. Lisennet: Lightweight sub-band and dual-path modeling for real-time speech enhancement. In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 1--5, 2025. doi:10.1109/ICASSP49660.2025.10888272

work page doi:10.1109/icassp49660.2025.10888272 2025
[46]

Qtn: Quaternion transformer network for hyperspectral image classification

Yang, X., Cao, W., Lu, Y., and Zhou, Y. Qtn: Quaternion transformer network for hyperspectral image classification. IEEE Transactions on Circuits and Systems for Video Technology, 33 0 (12): 0 7370--7384, 2023. doi:10.1109/TCSVT.2023.3283289

work page doi:10.1109/tcsvt.2023.3283289 2023
[47]

Qean: quaternion-enhanced attention network for visual dance generation

Zhou, Z., Huo, Y., Huang, G., Zeng, A., Chen, X., Huang, L., and Li, Z. Qean: quaternion-enhanced attention network for visual dance generation. Vis. Comput., 41 0 (2): 0 961^^e2^^80^^93973, April 2024. ISSN 0178-2789. doi:10.1007/s00371-024-03376-5

work page doi:10.1007/s00371-024-03376-5 2024
[48]

Quaternion convolutional neural networks

Zhu, X., Xu, Y., Xu, H., and Chen, C. Quaternion convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018

2018

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[2] [2]

CMGAN : Conformer-based metric gan for speech enhancement

Cao, R., Abdulatif, S., and Bin, Y. CMGAN : Conformer-based metric gan for speech enhancement. pp.\ 936--940, 09 2022. doi:10.21437/Interspeech.2022-517

work page doi:10.21437/interspeech.2022-517 2022

[3] [3]

L., Siniscalchi, S

Chao, R., Cheng, W.-H., Quatra, M. L., Siniscalchi, S. M., Yang, C.-H. H., Fu, S.-W., and Tsao, Y. An investigation of incorporating mamba for speech enhancement. In 2024 IEEE Spoken Language Technology Workshop (SLT), pp.\ 302--308, 2024. doi:10.1109/SLT61566.2024.10832332

work page doi:10.1109/slt61566.2024.10832332 2024

[4] [4]

Learning to rotate: Quaternion transformer for complicated periodical time series forecasting

Chen, W., Wang, W., Peng, B., Wen, Q., Zhou, T., and Sun, L. Learning to rotate: Quaternion transformer for complicated periodical time series forecasting. KDD '22, pp.\ 146^^e2^^80^^93156, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi:10.1145/3534678.3539234

work page doi:10.1145/3534678.3539234 2022

[5] [5]

Y., Ermon, S., Rudra, A., and R\' e , C

Dao, T., Fu, D. Y., Ermon, S., Rudra, A., and R\' e , C. Flashattention: fast and memory-efficient exact attention with io-awareness. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA, 2022. Curran Associates Inc. ISBN 9781713871088

2022

[6] [6]

MetricGAN+ : An improved version of MetricGAN for speech enhancement

Fu, S.-W., Yu, C., Hsieh, T.-A., Plantinga, P., Ravanelli, M., Lu, X., and Tsao, Y. MetricGAN+ : An improved version of MetricGAN for speech enhancement. In Interspeech, pp.\ 201--205, 2021. doi:10.21437/Interspeech.2021-599

work page doi:10.21437/interspeech.2021-599 2021

[7] [7]

and Maida, A

Gaudet, C. and Maida, A. Deep quaternion networks. pp.\ 1--8, 07 2018. doi:10.1109/IJCNN.2018.8489651

work page doi:10.1109/ijcnn.2018.8489651 2018

[8] [8]

and Wang, P

Grant, B. and Wang, P. Quaternion approximation networks for enhanced image classification and oriented object detection, 2025. URL https://arxiv.org/abs/2509.05512

work page arXiv 2025

[9] [9]

Conformer: Convolution-augmented transformer for speech recognition

Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y., and Pang, R. Conformer: Convolution-augmented transformer for speech recognition. In Proc. Interspeech, pp.\ 5036--5040, 2020

2020

[10] [10]

u rlebeck, K., Habetha, K., and Spr \

G \"u rlebeck, K., Habetha, K., and Spr \"o ig, W. Holomorphic functions in the plane and n-dimensional space. 2007. URL https://api.semanticscholar.org/CorpusID:117407172

2007

[11] [11]

Hamilton, W. R. S. On quaternions, or on a new system of imaginaries in algebra. 1847

[12] [12]

Hashim, H. F. B. and Ogawa, T. Estimation of forearm motion based on emg using quaternion neural network. Journal of Advanced Computational Intelligence and Intelligent Informatics, 26 0 (3): 0 269--278, 2022. doi:10.20965/jaciii.2022.p0269

work page doi:10.20965/jaciii.2022.p0269 2022

[13] [13]

and Loizou, P

Hu, Y. and Loizou, P. C. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16 0 (1): 0 229--238, 2008. doi:10.1109/TASL.2007.911054

work page doi:10.1109/tasl.2007.911054 2008

[14] [14]

DCCRN : Deep complex convolution recurrent network for phase-aware speech enhancement

Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., Wu, J., Zhang, B., and Xie, L. DCCRN : Deep complex convolution recurrent network for phase-aware speech enhancement. In Interspeech, pp.\ 2472--2476, 2020. doi:10.21437/Interspeech.2020-2537

work page doi:10.21437/interspeech.2020-2537 2020

[15] [15]

Quaternion neural network and its application

Isokawa, T., Kusakabe, T., Matsui, N., and Peper, F. Quaternion neural network and its application. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp.\ 318--324. Springer, 2003

2003

[16] [16]

Quaternionic neural networks: Fundamental properties and applications

Isokawa, T., Matsui, N., and Nishimura, H. Quaternionic neural networks: Fundamental properties and applications. In Complex-Valued Neural Networks: Utilizing High-Dimensional Parameters, pp.\ 411--439. 2009

2009

[17] [17]

and Seo, H

Kim, E. and Seo, H. Se-conformer: Time-domain speech enhancement using conformer. pp.\ 2736--2740, 08 2021. doi:10.21437/Interspeech.2021-2207

work page doi:10.21437/interspeech.2021-2207 2021

[18] [18]

End-to-end multi-task denoising for joint sdr and pesq optimization

Kim, J., El-Khamy, M., and Lee, J. End-to-end multi-task denoising for joint sdr and pesq optimization. ArXiv, abs/1901.09146, 2019. URL https://api.semanticscholar.org/CorpusID:59316572

work page arXiv 1901

[19] [19]

Learning multiple layers of features from tiny images

Krizhevsky, A. Learning multiple layers of features from tiny images. pp.\ 32--33, 2009. URL https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf

2009

[20] [20]

Le Roux, J., Wisdom, S., Erdogan, H., and Hershey, J. R. SDR -- Half-Baked or Well Done? In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.\ 626--630, 2019. doi:10.1109/ICASSP.2019.8683855

work page doi:10.1109/icassp.2019.8683855 2019

[21] [21]

A si-sdr loss function based monaural source separation

Li, S., Liu, H., Zhou, Y., and Luo, Z. A si-sdr loss function based monaural source separation. In 2020 15th IEEE International Conference on Signal Processing (ICSP), volume 1, pp.\ 356--360, 2020. doi:10.1109/ICSP48669.2020.9321080

work page doi:10.1109/icsp48669.2020.9321080 2020

[22] [22]

and Hutter, F

Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7

2019

[23] [23]

B., Tiwari, N., and Mishra, S

Mukhopadhyay, A., Joshi, R. B., Tiwari, N., and Mishra, S. Transformers at a fraction. In Northern Lights Deep Learning Conference 2025, 2024. URL https://openreview.net/forum?id=1U0kkt7ymn

2025

[24] [24]

and Radfar, M

Muppidi, A. and Radfar, M. Speech emotion recognition using quaternion convolutional neural networks. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6309--6313, 2021. doi:10.1109/ICASSP39728.2021.9414248

work page doi:10.1109/icassp39728.2021.9414248 2021

[25] [25]

A quaternary version of the back-propagation algorithm

Nitta, T. A quaternary version of the back-propagation algorithm. In Proceedings of ICNN'95 - International Conference on Neural Networks, volume 5, pp.\ 2753--2756, 1995. doi:10.1109/ICNN.1995.488166

work page doi:10.1109/icnn.1995.488166 1995

[26] [26]

Quaternion convolutional neural networks for end-to-end automatic speech recognition

Parcollet, T., Zhang, Y., Morchid, M., Trabelsi, C., Linar^^c3^^a8s, G., De Mori, R., and Bengio, Y. Quaternion convolutional neural networks for end-to-end automatic speech recognition. 06 2018. doi:10.21437/Interspeech.2018-1898

work page doi:10.21437/interspeech.2018-1898 2018

[27] [27]

Segan: Speech enhancement generative adversarial network

Pascual, S., Bonafonte, A., and Serr^^c3^^a0, J. Segan: Speech enhancement generative adversarial network. pp.\ 3642--3646, 08 2017. doi:10.21437/Interspeech.2017-1428

work page doi:10.21437/interspeech.2017-1428 2017

[28] [28]

Reddy, C. K. A., Dubey, H., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., and Srinivasan, S. Icassp 2021 deep noise suppression challenge. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6623--6627, 2021 a . doi:10.1109/ICASSP39728.2021.9415105

work page doi:10.1109/icassp39728.2021.9415105 2021

[29] [29]

Reddy, C. K. A., Gopal, V., and Cutler, R. Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6493--6497, 2021 b . doi:10.1109/ICASSP39728.2021.9414878

work page doi:10.1109/icassp39728.2021.9414878 2021

[30] [30]

and Beerends, J.G

Rix, A. W., Beerends, J. G., Hollier, M. P., and Hekstra, A. P. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pp.\ 749--752, 2001. doi:10.1109/ICASSP.2001.941023

work page doi:10.1109/icassp.2001.941023 2001

[31] [31]

S., Kartiwi, M., Nugroho, B

Saleem, N., Gunawan, T. S., Kartiwi, M., Nugroho, B. S., and Wijayanto, I. Nse-catnet: Deep neural speech enhancement using convolutional attention transformer network. IEEE Access, 11: 0 66979--66994, 2023. doi:10.1109/ACCESS.2023.3290908

work page doi:10.1109/access.2023.3290908 2023

[32] [32]

Universal Score-based Speech Enhancement with High Content Preservation

Scheibler, R., Fujita, Y., Shirahata, Y., and Komatsu, T. Universal Score-based Speech Enhancement with High Content Preservation . In Interspeech 2024 , pp.\ 1165--1169, 2024. doi:10.21437/Interspeech.2024-138

work page doi:10.21437/interspeech.2024-138 2024

[33] [33]

N., Rosenkranz, T., and Maier, A

Schr \"o ter, H., Escalante-B., A. N., Rosenkranz, T., and Maier, A. DeepFilterNet : Perceptually motivated real-time speech enhancement. In Interspeech 2023, pp.\ 2008--2009, 2023. URL https://www.isca-archive.org/interspeech_2023/schroter23b_interspeech.html

2023

[34] [34]

D., Ng, A., and Potts, C

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.\ 1631--1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. UR...

2013

[35] [35]

H., Hendriks, R

Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.\ 4214--4217, 2010. doi:10.1109/ICASSP.2010.5495701

work page doi:10.1109/icassp.2010.5495701 2010

[36] [36]

T., Rao, J., Zhang, S., Wang, S., Fu, J., and Hui, S

Tay, Y., Zhang, A., Luu, A. T., Rao, J., Zhang, S., Wang, S., Fu, J., and Hui, S. C. Lightweight and efficient neural natural language processing with quaternion networks. In Korhonen, A., Traum, D., and M \`a rquez, L. (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.\ 1494--1503, Florence, Italy, July 2...

work page doi:10.18653/v1/p19-1145 2019

[37] [37]

The diverse environments multi-channel acoustic noise database ( DEMAND ): A database of multichannel environmental noise recordings

Thiemann, J., Ito, N., and Vincent, E. The diverse environments multi-channel acoustic noise database ( DEMAND ): A database of multichannel environmental noise recordings. In Proceedings of Meetings on Acoustics, volume 19, pp.\ 035081. Acoustical Society of America, 2013

2013

[38] [38]

Investigating rnn-based speech enhancement methods for noise-robust text-to-speech

Valentini-Botinhao, C., Wang, X., Takaki, S., and Yamagishi, J. Investigating rnn-based speech enhancement methods for noise-robust text-to-speech. In 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), pp.\ 146--152, 2016. doi:10.21437/SSW.2016-24

work page doi:10.21437/ssw.2016-24 2016

[39] [39]

Wavenet: A generative model for raw audio

van den Oord , A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. In 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), pp.\ 125, 2016

2016

[40] [40]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017

2017

[41] [41]

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database

Veaux, C., Yamagishi, J., and King, S. The voice bank corpus: Design, collection and data analysis of a large regional accent speech database. In Proc. Int. Conf. Oriental COCOSDA, November 2013

2013

[42] [42]

TSTNN : Two-stage transformer based neural network for speech enhancement in the time domain

Wang, K., He, B., and Zhu, W.-P. TSTNN : Two-stage transformer based neural network for speech enhancement in the time domain. In ICASSP, pp.\ 7098--7102, 2021

2021

[43] [43]

Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Yamamoto, R., Song, E., and Kim, J.-M. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 6199--6203, 2020. doi:10.1109/ICASSP40776.2020.9053795

work page doi:10.1109/icassp40776.2020.9053795 2020

[44] [44]

Improving the Generation of

Yamauchi, S., Nitta, T., and Ohnishi, T. Learning characteristics of reverse quaternion neural network. In 2025 International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8, 2025. doi:10.1109/IJCNN64981.2025.11228907

work page doi:10.1109/ijcnn64981.2025.11228907 2025

[45] [45]

PDMX: A large-scale public domain MusicXML dataset for symbolic music processing

Yan, H., Zhang, J., Fan, C., Zhou, Y., and Liu, P. Lisennet: Lightweight sub-band and dual-path modeling for real-time speech enhancement. In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.\ 1--5, 2025. doi:10.1109/ICASSP49660.2025.10888272

work page doi:10.1109/icassp49660.2025.10888272 2025

[46] [46]

Qtn: Quaternion transformer network for hyperspectral image classification

Yang, X., Cao, W., Lu, Y., and Zhou, Y. Qtn: Quaternion transformer network for hyperspectral image classification. IEEE Transactions on Circuits and Systems for Video Technology, 33 0 (12): 0 7370--7384, 2023. doi:10.1109/TCSVT.2023.3283289

work page doi:10.1109/tcsvt.2023.3283289 2023

[47] [47]

Qean: quaternion-enhanced attention network for visual dance generation

Zhou, Z., Huo, Y., Huang, G., Zeng, A., Chen, X., Huang, L., and Li, Z. Qean: quaternion-enhanced attention network for visual dance generation. Vis. Comput., 41 0 (2): 0 961^^e2^^80^^93973, April 2024. ISSN 0178-2789. doi:10.1007/s00371-024-03376-5

work page doi:10.1007/s00371-024-03376-5 2024

[48] [48]

Quaternion convolutional neural networks

Zhu, X., Xu, Y., Xu, H., and Chen, C. Quaternion convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018

2018