Recognition: unknown
Batch Normalization for Neural Networks on Complex Domains
Pith reviewed 2026-05-09 19:17 UTC · model grok-4.3
The pith
Batch normalization extends to neural networks on complex domains by adapting to their geometry.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.
What carries the argument
Complex-domain batch normalization layers that adapt centering and scaling operations to the geometry of complex manifolds such as the Siegel disk.
If this is right
- Complex-valued networks for radar and signal-processing tasks become easier to train at scale.
- Graph neural networks operating on complex node features gain a standard stabilization tool.
- Action recognition pipelines that use complex representations can adopt the same training practices as real-valued networks.
- Implementation details for the Siegel disk become available for other researchers working on hyperbolic or matrix-valued data.
Where Pith is reading between the lines
- The same derivation strategy could be applied to additional complex domains that currently lack BN support.
- Integration with other complex-valued building blocks such as complex convolutions or activations may become straightforward once the normalization layer is in place.
Load-bearing premise
That these geometrically adapted normalization steps will improve training stability and final accuracy on complex-domain tasks without introducing new computational or numerical problems.
What would settle it
A controlled comparison in which networks equipped with the proposed complex-domain BN layers show no faster convergence or higher accuracy than identical networks trained without normalization or with only Euclidean-style BN.
read the original abstract
Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of principled Riemannian analogs of fundamental building blocks in deep neural networks (DNNs). Among those, Riemannian batch normalization (BN) layers have shown to enhance training stability and improve accuracy. In this paper, we propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes batch normalization (BN) layers for neural networks on complex domains. It establishes close connections to existing Riemannian BN layers, derives practical implementation components (including for less-studied domains such as the Siegel disk), and reports experimental results on radar clutter classification, node classification, and action recognition tasks to demonstrate efficacy.
Significance. If the derivations correctly extend Riemannian BN while preserving domain structure, the work would meaningfully broaden the set of available building blocks for geometric deep learning on complex manifolds. The multi-task experimental validation and focus on under-studied domains such as the Siegel disk address a genuine gap; reproducible code or explicit parameter-free derivations would further strengthen the contribution.
major comments (2)
- [§3] §3 (Method): the central claim that the proposed BN layers are a direct, structure-preserving extension of Riemannian BN requires an explicit side-by-side comparison of the update rules and the proof that the derived mean/variance operations remain within the target domain (e.g., Siegel disk). Without this, it is unclear whether the construction is novel or reduces to prior work by design.
- [§4] §4 (Experiments): the reported accuracy gains on the three tasks are presented without error bars, baseline comparisons to standard Euclidean BN or existing Riemannian BN, or statistical significance tests. This weakens the claim that the method enhances training stability and accuracy in practice.
minor comments (2)
- [§2] Notation for complex-domain operations (e.g., the definition of the Siegel disk metric) should be introduced earlier and used consistently to aid readability.
- [§4] Figure captions and axis labels in the experimental plots should explicitly state the metrics (accuracy, loss) and the number of runs averaged.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Method): the central claim that the proposed BN layers are a direct, structure-preserving extension of Riemannian BN requires an explicit side-by-side comparison of the update rules and the proof that the derived mean/variance operations remain within the target domain (e.g., Siegel disk). Without this, it is unclear whether the construction is novel or reduces to prior work by design.
Authors: We agree that an explicit side-by-side comparison would improve clarity. While the manuscript emphasizes close connections rather than claiming a direct reduction, we will add a dedicated subsection or table in §3 that juxtaposes the update rules of the proposed complex BN against Riemannian BN. We will also include the derivations demonstrating that the Fréchet mean and variance operations map back into the target domain (e.g., Siegel disk), thereby confirming structure preservation and highlighting the novel handling of complex-valued domains. revision: yes
-
Referee: [§4] §4 (Experiments): the reported accuracy gains on the three tasks are presented without error bars, baseline comparisons to standard Euclidean BN or existing Riemannian BN, or statistical significance tests. This weakens the claim that the method enhances training stability and accuracy in practice.
Authors: We acknowledge this gap in the experimental presentation. In the revised manuscript we will augment §4 with error bars from multiple random seeds, explicit baseline comparisons against both Euclidean BN and existing Riemannian BN implementations on the radar, graph, and action-recognition tasks, and statistical significance tests (e.g., paired t-tests) to support the reported improvements in stability and accuracy. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper proposes new batch normalization layers for complex domains, explicitly deriving essential implementation components (e.g., for the Siegel disk) and validating them through experiments on radar clutter classification, node classification, and action recognition. The abstract notes close connections to existing Riemannian BN layers without claiming that the new derivations reduce to or are forced by those prior results by construction. No equations, fitted parameters, or self-citations are presented as load-bearing premises that equate outputs to inputs; the central claims rest on independent derivation and empirical testing rather than renaming, self-definition, or imported uniqueness theorems. This structure is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
Bdeir, A., Schwethelm, K., and Landwehr, N
URLhttps://arxiv.org/abs/2302.08286. Bdeir, A., Schwethelm, K., and Landwehr, N. Fully Hy- perbolic Convolutional Neural Networks for Computer Vision. InICLR,
-
[4]
Chami, I., Ying, R., R ´e, C., and Leskovec, J
URLhttps://arxiv.org/abs/2003.13869. Chami, I., Ying, R., R ´e, C., and Leskovec, J. Hyper- bolic Graph Convolutional Neural Networks.CoRR, abs/1910.12933,
-
[5]
Chen, Z., Song, Y ., Liu, Y ., and Sebe, N
URL https://arxiv.org/ abs/1910.12933. Chen, Z., Song, Y ., Liu, Y ., and Sebe, N. A Lie Group Approach to Riemannian Batch Normalization. InICLR,
-
[7]
Helgason, S.Differential Geometry, Lie Groups, and Sym- metric Spaces
URL https://arxiv.org/ abs/2110.10464. Helgason, S.Differential Geometry, Lie Groups, and Sym- metric Spaces. ISSN. Elsevier Science,
-
[8]
and Gool, L
Huang, Z. and Gool, L. V . A Riemannian Network for SPD Matrix Learning. InAAAI, pp. 2036–2042,
2036
- [10]
-
[11]
Semi-Supervised Classification with Graph Convolutional Networks
URL https://arxiv.org/ abs/1609.02907. 9 Batch Normalization for Neural Networks on Complex Domains Kobayashi, S. Invariant Distances on Complex Manifolds and Holomorphic Mappings.Journal of The Mathemati- cal Society of Japan, 19:460–480,
work page internal anchor Pith review arXiv
-
[13]
Pennec, X.Statistical Computing on Manifolds for Compu- tational Anatomy
URL https://arxiv.org/ abs/2210.01986. Pennec, X.Statistical Computing on Manifolds for Compu- tational Anatomy. Habilitation `a diriger des recherches, Universit´e Nice Sophia-Antipolis,
-
[15]
arXiv preprint arXiv:2006.08210 (2020)
URL https://arxiv.org/abs/2006.08210. Siegel, C. L. Symplectic Geometry.American Journal of Mathematics, 65:1–86,
-
[17]
Xiao, H., Liu, X., Song, Y ., Wong, G
URLhttps://arxiv.org/abs/2504.00660. Xiao, H., Liu, X., Song, Y ., Wong, G. Y ., and See, S. C. W. Complex Hyperbolic Knowledge Graph Embed- dings with Fast Fourier Transform. InConference on Em- pirical Methods in Natural Language Processing,
-
[19]
URL https://arxiv.org/abs/1802.09691. 10 Batch Normalization for Neural Networks on Complex Domains Table 5.The main notations used in our paper. Symbol Name C Set of complex numbers Cn n-dimensional complex vector space M General space E Complex normed space D Complex domain 0D Identity element ofD In n×nidentity matrix 0n n×nzero matrix q(·) Continuous ...
-
[20]
Experiments B.1
B. Experiments B.1. Radar Clutter Classification B.1.1. DATASETS ANDEXPERIMENTALSETTINGS Our datasets are simulated using the method in (2022). Given the length N of a time series, we generate the first r temporal elements using the following equation: y=b 1 2 x, where b is a Block-Toeplitz HPD matrix and x is a standard complex Gaussian random vector who...
2022
-
[21]
OPTIMIZATION ANDHYPERPARAMETERS Algorithm 2 shows the computation of the product space representation (˜p0, x1,
B.1.2. OPTIMIZATION ANDHYPERPARAMETERS Algorithm 2 shows the computation of the product space representation (˜p0, x1, . . . , xr−1)∈H + n ×SD r−1 n for a time series. We convert a point in H+ n ×SD r−1 n to Sym+ n ×SDr−1 n by taking the real part of the component of the point belonging to H+ n and then convert it to Sym+ n . To convert a real n×n matrix ...
2021
-
[22]
, xr−1),y= (y 0, y1,
by d2 kN N(x, y) =r log (x0)− 1 2 y0(x0)− 1 2 2 + r−1X j=1 r−j 4 Tr log2 I+c 1 2 j I−c 1 2 j −1 , wherex= (x 0, x1, . . . , xr−1),y= (y 0, y1, . . . , yr−1)∈H + n ×SD r−1 n , andc j, j= 1, . . . , r−1are given by cj = (yj −x j) I−(x j)H yj −1 (yj)H −(x j)H I−x j(yj)H −1 . We perform 10-fold cross-validation with the training set to find the best value ofk...
2017
-
[23]
Our networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adadelta optimizer for 2000 epochs
given by K(x) = 3 π (1−x 2)21x<1. Our networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adadelta optimizer for 2000 epochs. The learning rate and the batch size are set to 1e−2 and 25, respectively. For ComplexLSTM, we use Adam optimizer and set the learning rate to 1e−3 . The number of iterations for computing the...
2000
-
[24]
All experiments are performed using a machine with an Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz
Results are averaged over 5 random parameter initializations for each model. All experiments are performed using a machine with an Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz. To measure computation times, we use a machine with an Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz. B.1.3. COMPLEXITYANALYSIS The computations for both the K ¨ahler and Kobayashi distances ...
1906
-
[25]
(8) or Eq
28.16±0.0 31.81±0.0 36.15±0.0 51.33±0.0 30.68±0.0 29.61±0.0 Kernel-Siegel (Chevallier et al., 2016)27.83±0.0 33.05±0.0 35.01±0.0 52.15±0.0 32.80±0.0 28.31±0.0 SiegelNetFC (Nguyen et al., 2025a)31.24±0.44 41.16±0.55 40.02±0.48 62.47±0.51 42.86±0.22 34.22±0.46 SiegelNetBN 33.83±0.34 43.29±0.40 41.38±0.37 65.93±0.32 45.61±0.12 37.10±0.26 SiegelNetKobayashiBN...
2016
-
[26]
While SiegelNetBN is more time-consuming than SiegelNetFC, mainly due to the computation of the Fr´echet mean in the BN layer, the former still scales well with the dimensionality of input data, and the scale factor of the computation times of the two networks decreases as the embedding dimension increases. Tab. 8 presents the results from Tab. 1 of the m...
2018
-
[27]
with 20 train examples per class for Pubmed and Cora datasets. B.2.2. OPTIMIZATION ANDHYPERPARAMETERS We parameterize the Fr´echet mean of a set of points in Bn on a Euclidean space (the real and imaginary parts). The networks are implemented in the Pytorch framework and trained using cross-entropy loss and Adam optimizer. CBallNetBN projects the result o...
2019
-
[28]
We focus on comparing neural networks on different Riemannian manifolds
datasets. We focus on comparing neural networks on different Riemannian manifolds. This allows us to investigate the efficacy of embedding action data into different Riemannian manifolds for the task of action recognition. Here we consider a challenging setting in which the input data contain only one of the three coordinates (channels) of the 3D human jo...
2018
-
[29]
The baselines have the same architectures proposed in the original papers (Huang & Gool, 2017; Brooks et al., 2019; Chen et al., 2024; Wang et al., 2025)
which have shown to give the best performances on the considered task (Chen et al., 2024). The baselines have the same architectures proposed in the original papers (Huang & Gool, 2017; Brooks et al., 2019; Chen et al., 2024; Wang et al., 2025). We use the method in Section B.1.2 to compute the input data of our networks. Each input of our networks belong...
2024
-
[30]
Let x= (p 0, x1)∈Sym + n ×SHn be an input of the QMLRSym+ n ×SHn layer of SiegelNetBN
which is a state-of-the-art neural network on Grassmann manifolds as the baseline. Let x= (p 0, x1)∈Sym + n ×SHn be an input of the QMLRSym+ n ×SHn layer of SiegelNetBN. Let gx1 be the point 7https://github.com/GitZH-Chen/LieBN.git. 8https://github.com/jjscc/GBWBN. 16 Batch Normalization for Neural Networks on Complex Domains Table 12.Comparison of SPD an...
2017
-
[31]
In particular, GrNetSiegelBN improves GrNet by more than 9% in terms of mean accuracy in all cases
It can be observed that our proposed BN layer significantly improves the performance of GrNet. In particular, GrNetSiegelBN improves GrNet by more than 9% in terms of mean accuracy in all cases. C. Limitation of the Proposed Approach One of the key assumptions of our approach is the availability of a closed form of automorphisms on the considered domain. ...
2021
-
[32]
Definition D.1(Poincar ´e differential metric (Franzoni & Vesentini, 1980)).LetBbe the open unit disc inC, i.e., B={x∈C:|x|<1}
on the unit disc which is the starting point for studying invariant distances on complex domains. Definition D.1(Poincar ´e differential metric (Franzoni & Vesentini, 1980)).LetBbe the open unit disc inC, i.e., B={x∈C:|x|<1}. Letx∈Bandv∈C. Then the Poincar ´e differential metricg B :B×C→[0,+∞)is defined as gB(x, v) = |v| 1− |x| 2 . The Poincar´e distance ...
1980
-
[33]
Definition D.4(Analytic chain (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed space E, and let x, y∈D
The Kobayashi pseudodistance can also be defined via the concept of analytic chain. Definition D.4(Analytic chain (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed space E, and let x, y∈D . An analytic chain joining x and y in D consists of 2m points z′ 1, z′′ 1 , . . . , z′ m, z′′ m in B and of m functions fj ∈Hol(B,D)such that f1(z′
1980
-
[34]
, f j(z′′ j ) =f j+1(z′ j+1), forj= 1,
=x, . . . , f j(z′′ j ) =f j+1(z′ j+1), forj= 1, . . . , m−1, f m(z′′ m) =y. Definition D.5(The Kobayashi pseudodistance (Franzoni & Vesentini, 1980)).Let D be a domain in a complex normed spaceE, and letx, y∈D. Then the Kobayashi pseudodistance ¯dK D (x, y)is defined as ¯dK D (x, y) = inf{d B(z′ 1, z′′ 1 ) +d B(z′ 2, z′′ 2 ) +. . .+d B(z′ m, z′′ m)}, whe...
1980
-
[35]
that the above definition of the Kobayashi pseudodistance is equivalent to Definition 2.5 in the main paper, i.e., ¯dK D (x, y) =d K D (x, y), wherex, y∈D. D.3. Siegel Spaces In this section, we further discuss the geometries of the Siegel upper half space and the Siegel disk that have not provided in the main paper. D.3.1. THESIEGELUPPERHALFSPACE The Sie...
1943
-
[36]
, nare the eigenvalues of the cross-ratioR(x, y)
by dSHn(x, y) = vuuut nX j=1 log2 1 +r 1 2 j 1−r 1 2 j , wherer j, j= 1, . . . , nare the eigenvalues of the cross-ratioR(x, y). D.3.2. THESIEGELDISK The K¨ahler metric in the Siegel disk is given (Barbaresco, 2013; Jeuris & Vandebril,
2013
-
[37]
by dK SDn(x, y) = 1 2 log 1 +∥ϕ x(y)∥2 1− ∥ϕ x(y)∥2 ,(14) wherex, y∈SD n. D.3.3. CONVERSION BETWEEN THETWOMODELS One can convert a pointx∈SH n toSD n using the following matrix Cayley transformation: φ(x) = (x−iI)(x+iI) −1. The inverse matrix Cayley transformation that converts a pointx∈SD n toSH n is given by φ(−1)(x) =i(I+x)(I−x) −1. 20 Batch Normalizat...
2014
-
[38]
Sincep ∼= Symn, the automorphismϕ x(·)is given by ϕx(y) =x − 1 2 y 1 2 K
that the Cartan decomposition ofgis given by g=o n ⊕Sym n, whereo n denotes the Lie algebra ofO n. Sincep ∼= Symn, the automorphismϕ x(·)is given by ϕx(y) =x − 1 2 y 1 2 K. By Lemma E.2, there existsk∈Ksuch thatz=x − 1 2 y 1 2 k∈Sym + n . Letz= exp(v), v∈p. Then αx,y(t)⊗ϕ x(y) =t⊗x − 1 2 y 1 2 K =t⊗exp(v)K = exp(tv)K. Therefore, one has ϕ(−1) x (αx,y(t)⊗ϕ...
1980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.