arxiv: 2605.00467 · v1 · submitted 2026-05-01 · 💻 cs.LG · stat.ML

Recognition: unknown

Batch Normalization for Neural Networks on Complex Domains

Xuan Son Nguyen , Nistor Grozavu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:17 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords batch normalizationcomplex domainsRiemannian neural networksSiegel diskradar clutter classificationnode classificationaction recognition

0 comments

The pith

Batch normalization extends to neural networks on complex domains by adapting to their geometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops batch normalization layers specifically for neural networks whose parameters or data live in complex domains. These layers connect directly to prior Riemannian batch normalization techniques while filling in the missing practical pieces for domains that have received less attention, such as the Siegel disk. By supplying the necessary definitions for means, variances, and normalization steps on these spaces, the work makes stable training feasible for complex-valued models. Experiments on radar clutter classification, graph node classification, and action recognition confirm that the resulting layers improve convergence and accuracy in these settings.

Core claim

We propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.

What carries the argument

Complex-domain batch normalization layers that adapt centering and scaling operations to the geometry of complex manifolds such as the Siegel disk.

If this is right

Complex-valued networks for radar and signal-processing tasks become easier to train at scale.
Graph neural networks operating on complex node features gain a standard stabilization tool.
Action recognition pipelines that use complex representations can adopt the same training practices as real-valued networks.
Implementation details for the Siegel disk become available for other researchers working on hyperbolic or matrix-valued data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same derivation strategy could be applied to additional complex domains that currently lack BN support.
Integration with other complex-valued building blocks such as complex convolutions or activations may become straightforward once the normalization layer is in place.

Load-bearing premise

That these geometrically adapted normalization steps will improve training stability and final accuracy on complex-domain tasks without introducing new computational or numerical problems.

What would settle it

A controlled comparison in which networks equipped with the proposed complex-domain BN layers show no faster convergence or higher accuracy than identical networks trained without normalization or with only Euclidean-style BN.

read the original abstract

Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of principled Riemannian analogs of fundamental building blocks in deep neural networks (DNNs). Among those, Riemannian batch normalization (BN) layers have shown to enhance training stability and improve accuracy. In this paper, we propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives practical batch norm layers for complex domains, with Siegel disk derivations tied to Riemannian BN and tests on radar, graphs, and action recognition.

read the letter

The key takeaway is that the authors have worked out batch normalization layers for neural networks on complex domains. These connect closely to Riemannian batch norm and include specific practical pieces for domains like the Siegel disk that prior work left less developed. They derive the necessary components for implementation and then run experiments on radar clutter classification, node classification, and action recognition. The results indicate the method works in practice for those applications. This is new in the sense that it targets complex domains with explicit adaptations not directly in the referenced Riemannian literature. The experiments across different data types help demonstrate that the layers can improve stability and performance where complex representations are used. The soft spots are minor. The paper stays focused on a niche set of domains and tasks, so its impact is targeted rather than broad. The abstract mentions derivations but the full details would need checking to confirm no issues with how the complex structure is preserved during normalization. The experiments lack mention of error bars or strong baselines in the summary, but that might be filled in the paper. Overall, this is for researchers in geometric deep learning who deal with complex or manifold-valued data. Someone in that area could find the implementations useful. It has enough grounding in prior work and empirical tests to merit a serious referee. I recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes batch normalization (BN) layers for neural networks on complex domains. It establishes close connections to existing Riemannian BN layers, derives practical implementation components (including for less-studied domains such as the Siegel disk), and reports experimental results on radar clutter classification, node classification, and action recognition tasks to demonstrate efficacy.

Significance. If the derivations correctly extend Riemannian BN while preserving domain structure, the work would meaningfully broaden the set of available building blocks for geometric deep learning on complex manifolds. The multi-task experimental validation and focus on under-studied domains such as the Siegel disk address a genuine gap; reproducible code or explicit parameter-free derivations would further strengthen the contribution.

major comments (2)

[§3] §3 (Method): the central claim that the proposed BN layers are a direct, structure-preserving extension of Riemannian BN requires an explicit side-by-side comparison of the update rules and the proof that the derived mean/variance operations remain within the target domain (e.g., Siegel disk). Without this, it is unclear whether the construction is novel or reduces to prior work by design.
[§4] §4 (Experiments): the reported accuracy gains on the three tasks are presented without error bars, baseline comparisons to standard Euclidean BN or existing Riemannian BN, or statistical significance tests. This weakens the claim that the method enhances training stability and accuracy in practice.

minor comments (2)

[§2] Notation for complex-domain operations (e.g., the definition of the Siegel disk metric) should be introduced earlier and used consistently to aid readability.
[§4] Figure captions and axis labels in the experimental plots should explicitly state the metrics (accuracy, loss) and the number of runs averaged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Method): the central claim that the proposed BN layers are a direct, structure-preserving extension of Riemannian BN requires an explicit side-by-side comparison of the update rules and the proof that the derived mean/variance operations remain within the target domain (e.g., Siegel disk). Without this, it is unclear whether the construction is novel or reduces to prior work by design.

Authors: We agree that an explicit side-by-side comparison would improve clarity. While the manuscript emphasizes close connections rather than claiming a direct reduction, we will add a dedicated subsection or table in §3 that juxtaposes the update rules of the proposed complex BN against Riemannian BN. We will also include the derivations demonstrating that the Fréchet mean and variance operations map back into the target domain (e.g., Siegel disk), thereby confirming structure preservation and highlighting the novel handling of complex-valued domains. revision: yes
Referee: [§4] §4 (Experiments): the reported accuracy gains on the three tasks are presented without error bars, baseline comparisons to standard Euclidean BN or existing Riemannian BN, or statistical significance tests. This weakens the claim that the method enhances training stability and accuracy in practice.

Authors: We acknowledge this gap in the experimental presentation. In the revised manuscript we will augment §4 with error bars from multiple random seeds, explicit baseline comparisons against both Euclidean BN and existing Riemannian BN implementations on the radar, graph, and action-recognition tasks, and statistical significance tests (e.g., paired t-tests) to support the reported improvements in stability and accuracy. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes new batch normalization layers for complex domains, explicitly deriving essential implementation components (e.g., for the Siegel disk) and validating them through experiments on radar clutter classification, node classification, and action recognition. The abstract notes close connections to existing Riemannian BN layers without claiming that the new derivations reduce to or are forced by those prior results by construction. No equations, fitted parameters, or self-citations are presented as load-bearing premises that equate outputs to inputs; the central claims rest on independent derivation and empirical testing rather than renaming, self-definition, or imported uniqueness theorems. This structure is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no free parameters, axioms, or invented entities are explicitly detailed or required for the high-level claim.

pith-pipeline@v0.9.0 · 5408 in / 1117 out tokens · 58911 ms · 2026-05-09T19:17:34.449731+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 10 canonical work pages · 1 internal anchor

[2]

Bdeir, A., Schwethelm, K., and Landwehr, N

URLhttps://arxiv.org/abs/2302.08286. Bdeir, A., Schwethelm, K., and Landwehr, N. Fully Hy- perbolic Convolutional Neural Networks for Computer Vision. InICLR,

work page arXiv
[4]

Chami, I., Ying, R., R ´e, C., and Leskovec, J

URLhttps://arxiv.org/abs/2003.13869. Chami, I., Ying, R., R ´e, C., and Leskovec, J. Hyper- bolic Graph Convolutional Neural Networks.CoRR, abs/1910.12933,

work page arXiv 2003
[5]

Chen, Z., Song, Y ., Liu, Y ., and Sebe, N

URL https://arxiv.org/ abs/1910.12933. Chen, Z., Song, Y ., Liu, Y ., and Sebe, N. A Lie Group Approach to Riemannian Batch Normalization. InICLR,

work page arXiv 1910
[7]

Helgason, S.Differential Geometry, Lie Groups, and Sym- metric Spaces

URL https://arxiv.org/ abs/2110.10464. Helgason, S.Differential Geometry, Lie Groups, and Sym- metric Spaces. ISSN. Elsevier Science,

work page arXiv
[8]

and Gool, L

Huang, Z. and Gool, L. V . A Riemannian Network for SPD Matrix Learning. InAAAI, pp. 2036–2042,

2036
[10]

URL http://arxiv.org/abs/0807.3980. Kipf, T. N. and Welling, M. Semi-Supervised Classi- fication with Graph Convolutional Networks.CoRR, abs/1609.02907,

work page arXiv
[11]

Semi-Supervised Classification with Graph Convolutional Networks

URL https://arxiv.org/ abs/1609.02907. 9 Batch Normalization for Neural Networks on Complex Domains Kobayashi, S. Invariant Distances on Complex Manifolds and Holomorphic Mappings.Journal of The Mathemati- cal Society of Japan, 19:460–480,

work page internal anchor Pith review arXiv
[13]

Pennec, X.Statistical Computing on Manifolds for Compu- tational Anatomy

URL https://arxiv.org/ abs/2210.01986. Pennec, X.Statistical Computing on Manifolds for Compu- tational Anatomy. Habilitation `a diriger des recherches, Universit´e Nice Sophia-Antipolis,

work page arXiv
[15]

arXiv preprint arXiv:2006.08210 (2020)

URL https://arxiv.org/abs/2006.08210. Siegel, C. L. Symplectic Geometry.American Journal of Mathematics, 65:1–86,

work page arXiv 2006
[17]

Xiao, H., Liu, X., Song, Y ., Wong, G

URLhttps://arxiv.org/abs/2504.00660. Xiao, H., Liu, X., Song, Y ., Wong, G. Y ., and See, S. C. W. Complex Hyperbolic Knowledge Graph Embed- dings with Fast Fourier Transform. InConference on Em- pirical Methods in Natural Language Processing,

work page arXiv
[19]

10 Batch Normalization for Neural Networks on Complex Domains Table 5.The main notations used in our paper

URL https://arxiv.org/abs/1802.09691. 10 Batch Normalization for Neural Networks on Complex Domains Table 5.The main notations used in our paper. Symbol Name C Set of complex numbers Cn n-dimensional complex vector space M General space E Complex normed space D Complex domain 0D Identity element ofD In n×nidentity matrix 0n n×nzero matrix q(·) Continuous ...

work page arXiv
[20]

Experiments B.1

B. Experiments B.1. Radar Clutter Classification B.1.1. DATASETS ANDEXPERIMENTALSETTINGS Our datasets are simulated using the method in (2022). Given the length N of a time series, we generate the first r temporal elements using the following equation: y=b 1 2 x, where b is a Block-Toeplitz HPD matrix and x is a standard complex Gaussian random vector who...

2022
[21]

OPTIMIZATION ANDHYPERPARAMETERS Algorithm 2 shows the computation of the product space representation (˜p0, x1,

B.1.2. OPTIMIZATION ANDHYPERPARAMETERS Algorithm 2 shows the computation of the product space representation (˜p0, x1, . . . , xr−1)∈H + n ×SD r−1 n for a time series. We convert a point in H+ n ×SD r−1 n to Sym+ n ×SDr−1 n by taking the real part of the component of the point belonging to H+ n and then convert it to Sym+ n . To convert a real n×n matrix ...

2021
[22]

, xr−1),y= (y 0, y1,

by d2 kN N(x, y) =r log (x0)− 1 2 y0(x0)− 1 2 2 + r−1X j=1 r−j 4 Tr log2 I+c 1 2 j I−c 1 2 j −1 , wherex= (x 0, x1, . . . , xr−1),y= (y 0, y1, . . . , yr−1)∈H + n ×SD r−1 n , andc j, j= 1, . . . , r−1are given by cj = (yj −x j) I−(x j)H yj −1 (yj)H −(x j)H I−x j(yj)H −1 . We perform 10-fold cross-validation with the training set to find the best value ofk...

2017
[23]

Our networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adadelta optimizer for 2000 epochs

given by K(x) = 3 π (1−x 2)21x<1. Our networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adadelta optimizer for 2000 epochs. The learning rate and the batch size are set to 1e−2 and 25, respectively. For ComplexLSTM, we use Adam optimizer and set the learning rate to 1e−3 . The number of iterations for computing the...

2000
[24]

All experiments are performed using a machine with an Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz

Results are averaged over 5 random parameter initializations for each model. All experiments are performed using a machine with an Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz. To measure computation times, we use a machine with an Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz. B.1.3. COMPLEXITYANALYSIS The computations for both the K ¨ahler and Kobayashi distances ...

1906
[25]

(8) or Eq

28.16±0.0 31.81±0.0 36.15±0.0 51.33±0.0 30.68±0.0 29.61±0.0 Kernel-Siegel (Chevallier et al., 2016)27.83±0.0 33.05±0.0 35.01±0.0 52.15±0.0 32.80±0.0 28.31±0.0 SiegelNetFC (Nguyen et al., 2025a)31.24±0.44 41.16±0.55 40.02±0.48 62.47±0.51 42.86±0.22 34.22±0.46 SiegelNetBN 33.83±0.34 43.29±0.40 41.38±0.37 65.93±0.32 45.61±0.12 37.10±0.26 SiegelNetKobayashiBN...

2016
[26]

While SiegelNetBN is more time-consuming than SiegelNetFC, mainly due to the computation of the Fr´echet mean in the BN layer, the former still scales well with the dimensionality of input data, and the scale factor of the computation times of the two networks decreases as the embedding dimension increases. Tab. 8 presents the results from Tab. 1 of the m...

2018
[27]

with 20 train examples per class for Pubmed and Cora datasets. B.2.2. OPTIMIZATION ANDHYPERPARAMETERS We parameterize the Fr´echet mean of a set of points in Bn on a Euclidean space (the real and imaginary parts). The networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adam optimizer. CBallNetBN projects the result o...

2019
[28]

We focus on comparing neural networks on different Riemannian manifolds

datasets. We focus on comparing neural networks on different Riemannian manifolds. This allows us to investigate the efficacy of embedding action data into different Riemannian manifolds for the task of action recognition. Here we consider a challenging setting in which the input data contain only one of the three coordinates (channels) of the 3D human jo...

2018
[29]

The baselines have the same architectures proposed in the original papers (Huang & Gool, 2017; Brooks et al., 2019; Chen et al., 2024; Wang et al., 2025)

which have shown to give the best performances on the considered task (Chen et al., 2024). The baselines have the same architectures proposed in the original papers (Huang & Gool, 2017; Brooks et al., 2019; Chen et al., 2024; Wang et al., 2025). We use the method in Section B.1.2 to compute the input data of our networks. Each input of our networks belong...

2024
[30]

Let x= (p 0, x1)∈Sym + n ×SHn be an input of the QMLRSym+ n ×SHn layer of SiegelNetBN

which is a state-of-the-art neural network on Grassmann manifolds as the baseline. Let x= (p 0, x1)∈Sym + n ×SHn be an input of the QMLRSym+ n ×SHn layer of SiegelNetBN. Let gx1 be the point 7https://github.com/GitZH-Chen/LieBN.git. 8https://github.com/jjscc/GBWBN. 16 Batch Normalization for Neural Networks on Complex Domains Table 12.Comparison of SPD an...

2017
[31]

In particular, GrNetSiegelBN improves GrNet by more than 9% in terms of mean accuracy in all cases

It can be observed that our proposed BN layer significantly improves the performance of GrNet. In particular, GrNetSiegelBN improves GrNet by more than 9% in terms of mean accuracy in all cases. C. Limitation of the Proposed Approach One of the key assumptions of our approach is the availability of a closed form of automorphisms on the considered domain. ...

2021
[32]

Definition D.1(Poincar ´e differential metric (Franzoni & Vesentini, 1980)).LetBbe the open unit disc inC, i.e., B={x∈C:|x|<1}

on the unit disc which is the starting point for studying invariant distances on complex domains. Definition D.1(Poincar ´e differential metric (Franzoni & Vesentini, 1980)).LetBbe the open unit disc inC, i.e., B={x∈C:|x|<1}. Letx∈Bandv∈C. Then the Poincar ´e differential metricg B :B×C→[0,+∞)is defined as gB(x, v) = |v| 1− |x| 2 . The Poincar´e distance ...

1980
[33]

Definition D.4(Analytic chain (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed space E, and let x, y∈D

The Kobayashi pseudodistance can also be defined via the concept of analytic chain. Definition D.4(Analytic chain (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed space E, and let x, y∈D . An analytic chain joining x and y in D consists of 2m points z′ 1, z′′ 1 , . . . , z′ m, z′′ m in B and of m functions fj ∈Hol(B,D)such that f1(z′

1980
[34]

, f j(z′′ j ) =f j+1(z′ j+1), forj= 1,

=x, . . . , f j(z′′ j ) =f j+1(z′ j+1), forj= 1, . . . , m−1, f m(z′′ m) =y. Definition D.5(The Kobayashi pseudodistance (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed spaceE, and letx, y∈D. Then the Kobayashi pseudodistance ¯dK D (x, y)is defined as ¯dK D (x, y) = inf{d B(z′ 1, z′′ 1 ) +d B(z′ 2, z′′ 2 ) +. . .+d B(z′ m, z′′ m)}, whe...

1980
[35]

that the above definition of the Kobayashi pseudodistance is equivalent to Definition 2.5 in the main paper, i.e., ¯dK D (x, y) =d K D (x, y), wherex, y∈D. D.3. Siegel Spaces In this section, we further discuss the geometries of the Siegel upper half space and the Siegel disk that have not provided in the main paper. D.3.1. THESIEGELUPPERHALFSPACE The Sie...

1943
[36]

, nare the eigenvalues of the cross-ratioR(x, y)

by dSHn(x, y) = vuuut nX j=1 log2  1 +r 1 2 j 1−r 1 2 j  , wherer j, j= 1, . . . , nare the eigenvalues of the cross-ratioR(x, y). D.3.2. THESIEGELDISK The K¨ahler metric in the Siegel disk is given (Barbaresco, 2013; Jeuris & Vandebril,

2013
[37]

by dK SDn(x, y) = 1 2 log 1 +∥ϕ x(y)∥2 1− ∥ϕ x(y)∥2 ,(14) wherex, y∈SD n. D.3.3. CONVERSION BETWEEN THETWOMODELS One can convert a pointx∈SH n toSD n using the following matrix Cayley transformation: φ(x) = (x−iI)(x+iI) −1. The inverse matrix Cayley transformation that converts a pointx∈SD n toSH n is given by φ(−1)(x) =i(I+x)(I−x) −1. 20 Batch Normalizat...

2014
[38]

Sincep ∼= Symn, the automorphismϕ x(·)is given by ϕx(y) =x − 1 2 y 1 2 K

that the Cartan decomposition ofgis given by g=o n ⊕Sym n, whereo n denotes the Lie algebra ofO n. Sincep ∼= Symn, the automorphismϕ x(·)is given by ϕx(y) =x − 1 2 y 1 2 K. By Lemma E.2, there existsk∈Ksuch thatz=x − 1 2 y 1 2 k∈Sym + n . Letz= exp(v), v∈p. Then αx,y(t)⊗ϕ x(y) =t⊗x − 1 2 y 1 2 K =t⊗exp(v)K = exp(tv)K. Therefore, one has ϕ(−1) x (αx,y(t)⊗ϕ...

1980