Recognition: unknown
Complex SGD and Directional Bias in Reproducing Kernel Hilbert Spaces
Pith reviewed 2026-05-08 12:10 UTC · model grok-4.3
The pith
Complex SGD converges under the same assumptions as real SGD and extends directional bias to kernel regression in complex RKHS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose complex SGD that permits complex parameters and derive convergence guarantees under assumptions parallel to the real setting. These results also hold for GD. With the same assumptions we show that directional bias results extend from real to complex kernel regression problems. Empirical tests in complex RKHS recover superoscillation functions and Blaschke products as optimal solutions for particular loss functions.
What carries the argument
Complex SGD, defined via a complex gradient that enables parameter updates without analyticity constraints and allows direct transfer of real-case convergence arguments.
If this is right
- Convergence proofs for real SGD and GD transfer directly to complex-valued optimization problems.
- Directional bias in kernel regression holds in complex reproducing kernel Hilbert spaces.
- Complex SGD recovers superoscillation functions from the Fock space and Blaschke products from the Hardy space for appropriate losses.
- Complex-valued neural networks can be trained at large scale without analyticity constraints.
Where Pith is reading between the lines
- The parallel assumption structure suggests that existing real-valued SGD analyses can be reused for complex domains with minimal modification.
- Applications in signal processing or quantum-inspired models may benefit from the bias-preserving property when fitting complex kernels.
- Non-convex extensions could be tested by checking whether the same bias directions appear in complex neural network training.
Load-bearing premise
The standard assumptions of convexity, smoothness, and bounded variance continue to guarantee convergence when both parameters and gradients are complex-valued.
What would settle it
A concrete counter-example in which complex SGD fails to converge or loses the predicted directional bias in a low-dimensional complex kernel regression task under the listed assumptions.
read the original abstract
Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a complex-valued variant of SGD (complex SGD) for optimization with complex parameters, extending to GD, and claims convergence guarantees under assumptions (convexity, smoothness, bounded variance) that parallel the real-valued case without requiring analyticity. It further asserts that directional bias results from real kernel regression carry over to the complex RKHS setting, and provides empirical demonstrations of recovering superoscillation functions from the Fock space and Blaschke products from the Hardy space via suitable loss functions in complex kernel regression.
Significance. If the convergence analysis holds under the stated parallel assumptions, the work would meaningfully extend SGD theory to complex domains, supporting applications in complex-valued neural networks and kernel methods where analyticity is undesirable. The empirical recovery of specific functions in Fock and Hardy spaces provides concrete evidence of utility in complex RKHS settings. The lack of analyticity constraints is a potential strength over prior complex optimization literature.
major comments (2)
- [Convergence analysis / Theorem on complex SGD] Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.
- [Directional bias results / kernel regression experiments] Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.
minor comments (2)
- [Introduction] The definition of the 'newly defined gradient' for complex parameters should be stated explicitly with notation in the introduction or §2 rather than deferred to the methods.
- [Experiments] Empirical figures for superoscillation and Blaschke product recovery would benefit from error bars or multiple random seeds to demonstrate stability of the complex SGD.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for highlighting these important points regarding the rigor of the convergence analysis and the directional bias extension. We address each major comment below and will incorporate the requested clarifications and derivations into a revised version.
read point-by-point responses
-
Referee: Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.
Authors: We agree that an explicit re-derivation is required to rigorously establish the descent inequality under the sesquilinear inner product and complex modulus. Although the assumptions are stated to parallel the real-valued case, the stochastic-error control step does need to be re-derived from first principles for the complex setting. In the revised manuscript we will insert a complete, self-contained derivation of the key inequality (including the handling of the complex modulus and the expectation of the squared norm), thereby securing the convergence result under the listed assumptions. revision: yes
-
Referee: Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.
Authors: We acknowledge that a direct comparison is necessary to demonstrate that the extension accounts for phase and the Hermitian inner-product structure rather than being merely formal. In the revised version we will expand the directional-bias subsection to include side-by-side statements of the real-valued bias equation and its complex counterpart, explicitly showing how the bias term is modified by the sesquilinear form and the phase factor. This will make the substantive nature of the extension clear. revision: yes
Circularity Check
No circularity: convergence and bias extensions derived from parallel assumptions without self-referential reduction.
full rationale
The paper introduces complex SGD as a variant allowing complex parameters, then states convergence guarantees under assumptions (convexity, smoothness, bounded variance) that explicitly parallel the real-valued case, without analyticity constraints. Directional bias results for kernel regression in complex RKHS are presented as extensions confirmed under the same assumptions. No equations or claims reduce a target quantity to its own definition or fitted inputs by construction. No load-bearing self-citation chain is invoked to justify uniqueness or the core proofs; the work treats the real-setting results as external benchmarks and transplants the assumption structure directly. This is a standard honest extension rather than a renaming or ansatz smuggling. The derivation chain remains self-contained against external real-valued SGD literature.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Assumptions parallel to those used for real-valued SGD (convexity, smoothness, bounded variance) hold and suffice for complex parameters and gradients
Reference graph
Works this paper leans on
-
[1]
Wirtinger calculus based gradient descent and levenberg-marquardt learning algorithms in complex-valued neural networks,
M. F. Amin, M. I. Amin, A. Y. H. Al-Nuaimi, and K. Murase, “Wirtinger calculus based gradient descent and levenberg-marquardt learning algorithms in complex-valued neural networks,” in Neural Information Processing, B.-L. Lu, L. Zhang, and J. Kwok, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 550–559
2011
-
[2]
A complex gradient operator and its application in adaptive array theory ,
D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory ,”IEE Proceedings F: Communications, Radar and Signal Processing, vol. 130, no. 1, pp. 11–16, 1983
1983
-
[3]
The complex gradient operator and the CR- calculus,
K. Kreutz-Delgado, “The complex gradient operator and the CR-calculus,” arXiv preprint arXiv:0906.4835, 2009
-
[4]
The complex LMS algorithm,
B. Widrow, J. McCool, and M. Ball, “The complex LMS algorithm,”Proceedings of the IEEE, vol. 63, no. 4, pp. 719–720, 1975
1975
-
[5]
Complex gradient and hessian,
A. van den Bos, “Complex gradient and hessian,” IEE Proceedings – Vision, Image and Signal Processing, vol. 141, no. 6, pp. 380–382, 1994
1994
-
[6]
Complex-valued matrix differentiation: Techniques and key results,
A. Hjørungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,” IEEE Transactions on Signal Processing, vol. 55, no. 6, pp. 2740–2746, 2007
2007
-
[7]
A short tutorial on Wirtinger Calculus with applications in quantum information,
K. Koor, Y. Qiu, L. C. Kwek, and P. Rebentrost, “A short tutorial on wirtinger calculus with applications in quantum information,” arXiv preprint arXiv:2312.04858, 2023
-
[8]
Convergence analysis of an augmented algorithm for fully complex-valued neural networks,
D. Xu, H. Zhang, and D. P. Mandic, “Convergence analysis of an augmented algorithm for fully complex-valued neural networks,” Neural Networks, vol. 69, pp. 44–50, 2015
2015
-
[9]
Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus,
H. Zhang, X. Liu, D. Xu, and Y. Zhang, “Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus,”Cognitive Neurodynamics, vol. 8, no. 3, pp. 261–266, 2014
2014
-
[10]
Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate,
J. Wu, D. Zou, V. Braverman, and Q. Gu, “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=3X64RLgzY6O
2021
-
[11]
The directional bias helps stochastic gradient descent to generalize in kernel regression models,
Y. Luo, X. Huo, and Y. Mei, “The directional bias helps stochastic gradient descent to generalize in kernel regression models,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 678–683
2022
-
[12]
Some results on Tchebycheffian spline functions,
G. S. Kimeldorf and G. Wahba, “Some results on Tchebycheffian spline functions,” Journal of Mathematical Analysis and Applications, vol. 33, no. 1, pp. 82–95, 1971
1971
-
[13]
Representer theorem in complex reproducing kernel hilbert spaces with applications to fock and hardy spaces and superoscillations,
N. Alpay , A. De Martino, and K. Diki, “Representer theorem in complex reproducing kernel hilbert spaces with applications to fock and hardy spaces and superoscillations,” 2026
2026
-
[14]
Superoscillations and physical applications,
A. N. Jordan, J. C. Howell, N. Vamivakas, and E. Karimi, “Superoscillations and physical applications,” in Operator Theory, D. Alpay , I. Sabadini, and F. Colombo, Eds. Basel: Springer, 2025
2025
-
[15]
Superoscillation: from physics to optical applications,
G. Chen, Z.-Q. Wen, and C.-W. Qiu, “Superoscillation: from physics to optical applications,” Light: Science & Applications, vol. 8, no. 1, p. 56, 2019
2019
-
[16]
Multivariable functional interpolation and adaptive networks,
D. S. Broomhead and D. Lowe, “Multivariable functional interpolation and adaptive networks,” Complex Systems, vol. 2, pp. 321–355, 1988
1988
-
[18]
Analogues of finite blaschke products as inner functions,
C. Felder and T. Le, “Analogues of finite blaschke products as inner functions,” Bull. Lond. Math. Soc., vol. 54, no. 4, pp. 1197– 1219, 2022
2022
-
[19]
S. R. Garcia, J. Mashreghi, and W. T. Ross, Finite Blaschke Products and Their Connections. Cham: Springer, 2018, xix+328 pp. ISBN: 978-3-319-78246-1; 978-3-319-78247-8
2018
-
[20]
Mashreghi, Blaschke products and their applications
J. Mashreghi, Blaschke products and their applications. Springer, 2013
2013
-
[21]
Handbook of convergence theorems for (stochastic) gradient methods,
G. Garrigos and R. M. Gower, “Handbook of convergence theorems for (stochastic) gradient methods,” 2024. [Online]. Available: https://arxiv.org/abs/2301.11235
-
[22]
Non-asymptotic analysis of stochastic approximation algorithms for machine learning,
E. Moulines and F. Bach, “Non-asymptotic analysis of stochastic approximation algorithms for machine learning,” Advances in neural information processing systems, vol. 24, 2011
2011
-
[23]
Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,
D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,” Advances in neural information processing systems, vol. 27, 2014
2014
-
[24]
An extension of the complex–real (c–r) calculus to the bicomplex setting, with applications,
D. Alpay , K. Diki, and M. Vajiac, “An extension of the complex–real (c–r) calculus to the bicomplex setting, with applications,” Mathematische Nachrichten, vol. 297, no. 2, pp. 454–481, 2024
2024
-
[25]
Phase retrieval via wirtinger flow: Theory and algorithms,
E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015
1985
-
[26]
A Randomized Kaczmarz Algorithm with Exponential Convergence,
T. Strohmer and R. Vershynin, “A Randomized Kaczmarz Algorithm with Exponential Convergence,” Journal of Fourier Analysis and Applications, vol. 15, 03 2007
2007
-
[27]
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,
D. Needell, N. Srebro, and R. Ward, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,” Mathematical Programming, vol. 155, no. 1, pp. 549–573, Jan 2016. [Online]. Available: https://doi.org/10.1007/s10107-015-0864-7
-
[28]
Randomized kaczmarz converges along small singular vectors,
S. Steinerberger, “Randomized kaczmarz converges along small singular vectors,” SIAM Journal on Matrix Analysis and Applications, vol. 42, pp. 608–615, 04 2021
2021
-
[29]
Aharonov, F
Y. Aharonov, F. Colombo, I. Sabadini, D. Struppa, and J. Tollaksen,The mathematics of superoscillations. American Mathematical Society , 2017, vol. 247, no. 1174
2017
-
[30]
Fractional supershifts and their associated cauchy evolution problems,
N. Alpay , “Fractional supershifts and their associated cauchy evolution problems,”arXiv preprint arXiv:2601.11829, 2026
-
[31]
Superoscillations and fock spaces,
D. Alpay , F. Colombo, K. Diki, I. Sabadini, and D. C. Struppa, “Superoscillations and fock spaces,”Journal of Mathematical Physics, vol. 64, no. 9, 2023
2023
-
[32]
Schölkopf and A
B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002. 26
2002
-
[33]
Kernel methods in machine learning,
T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008
2008
-
[34]
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press, 2006
2006
-
[35]
Theory of hp spaces academic press,
P. L. Duren, “Theory of hp spaces academic press,”New York, 1970
1970
-
[36]
New York: Academic Press, 1970
——, Theory of H p spaces. New York: Academic Press, 1970
1970
-
[37]
Rudin, Real and complex analysis, 3rd ed
W. Rudin, Real and complex analysis, 3rd ed. New York: McGraw-Hill Book Co., 1987
1987
-
[38]
A new characterization of the hardy space and of other hilbert spaces of analytic functions,
N. Alpay , “A new characterization of the hardy space and of other hilbert spaces of analytic functions,” Istanbul Journal of Mathematics, vol. 1, no. 1, pp. 1–11, 2023
2023
-
[39]
Bedrosian identity in blaschke product case,
P. Cerejeiras, C. Qiuhui, and U. Kaehler, “Bedrosian identity in blaschke product case,”Complex Anal. Oper. Theory, vol. 6, no. 1, pp. 275–300, 2012
2012
-
[40]
Saitoh, Theory of Reproducing Kernels and Its Applications, ser
S. Saitoh, Theory of Reproducing Kernels and Its Applications, ser. Pitman Research Notes in Mathematics Series. Harlow: Longman Scientific & Technical, 1988, vol. 189, co-published with John Wiley & Sons, New York. (NA) DEPARTMENT OFMATHEMATICS, UNIVERSITY OFCALIFORNIA, IRVINE, IRVINE, CA 92697, USA Email address:nalpay@uci.edu (EB) DEPARTMENT OFMATHEMAT...
1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.