pith. machine review for the scientific record. sign in

arxiv: 2604.23017 · v1 · submitted 2026-04-24 · 💻 cs.LG · cs.NA· math.CV· math.NA

Recognition: unknown

Complex SGD and Directional Bias in Reproducing Kernel Hilbert Spaces

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:10 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.CVmath.NA
keywords complex SGDreproducing kernel Hilbert spacesdirectional biaskernel regressionconvergence guaranteescomplex parametersFock spaceHardy space
0
0 comments X

The pith

Complex SGD converges under the same assumptions as real SGD and extends directional bias to kernel regression in complex RKHS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a complex variant of stochastic gradient descent that supports updates on complex-valued parameters. It establishes convergence guarantees that mirror the standard real-valued assumptions of convexity, smoothness, and bounded variance, without imposing analyticity requirements. The same guarantees cover gradient descent. In reproducing kernel Hilbert spaces, the directional bias previously shown for real kernel regression carries over to the complex setting. Experiments confirm that this complex SGD recovers superoscillation functions in the Fock space and Blaschke products in the Hardy space for suitably chosen losses.

Core claim

We propose complex SGD that permits complex parameters and derive convergence guarantees under assumptions parallel to the real setting. These results also hold for GD. With the same assumptions we show that directional bias results extend from real to complex kernel regression problems. Empirical tests in complex RKHS recover superoscillation functions and Blaschke products as optimal solutions for particular loss functions.

What carries the argument

Complex SGD, defined via a complex gradient that enables parameter updates without analyticity constraints and allows direct transfer of real-case convergence arguments.

If this is right

  • Convergence proofs for real SGD and GD transfer directly to complex-valued optimization problems.
  • Directional bias in kernel regression holds in complex reproducing kernel Hilbert spaces.
  • Complex SGD recovers superoscillation functions from the Fock space and Blaschke products from the Hardy space for appropriate losses.
  • Complex-valued neural networks can be trained at large scale without analyticity constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The parallel assumption structure suggests that existing real-valued SGD analyses can be reused for complex domains with minimal modification.
  • Applications in signal processing or quantum-inspired models may benefit from the bias-preserving property when fitting complex kernels.
  • Non-convex extensions could be tested by checking whether the same bias directions appear in complex neural network training.

Load-bearing premise

The standard assumptions of convexity, smoothness, and bounded variance continue to guarantee convergence when both parameters and gradients are complex-valued.

What would settle it

A concrete counter-example in which complex SGD fails to converge or loses the predicted directional bias in a low-dimensional complex kernel regression task under the listed assumptions.

read the original abstract

Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a complex-valued variant of SGD (complex SGD) for optimization with complex parameters, extending to GD, and claims convergence guarantees under assumptions (convexity, smoothness, bounded variance) that parallel the real-valued case without requiring analyticity. It further asserts that directional bias results from real kernel regression carry over to the complex RKHS setting, and provides empirical demonstrations of recovering superoscillation functions from the Fock space and Blaschke products from the Hardy space via suitable loss functions in complex kernel regression.

Significance. If the convergence analysis holds under the stated parallel assumptions, the work would meaningfully extend SGD theory to complex domains, supporting applications in complex-valued neural networks and kernel methods where analyticity is undesirable. The empirical recovery of specific functions in Fock and Hardy spaces provides concrete evidence of utility in complex RKHS settings. The lack of analyticity constraints is a potential strength over prior complex optimization literature.

major comments (2)
  1. [Convergence analysis / Theorem on complex SGD] Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.
  2. [Directional bias results / kernel regression experiments] Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.
minor comments (2)
  1. [Introduction] The definition of the 'newly defined gradient' for complex parameters should be stated explicitly with notation in the introduction or §2 rather than deferred to the methods.
  2. [Experiments] Empirical figures for superoscillation and Blaschke product recovery would benefit from error bars or multiple random seeds to demonstrate stability of the complex SGD.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting these important points regarding the rigor of the convergence analysis and the directional bias extension. We address each major comment below and will incorporate the requested clarifications and derivations into a revised version.

read point-by-point responses
  1. Referee: Convergence theorem for complex SGD (likely §3 or the main theorem paralleling real SGD): The bounded-variance assumption is transplanted directly from the real case to bound E[||g_t - ∇f||^2] ≤ σ² inside the descent inequality, but the sesquilinear inner product and complex modulus require an explicit re-derivation of the inequality; without it the stochastic-error control step is not secured by the listed assumptions alone.

    Authors: We agree that an explicit re-derivation is required to rigorously establish the descent inequality under the sesquilinear inner product and complex modulus. Although the assumptions are stated to parallel the real-valued case, the stochastic-error control step does need to be re-derived from first principles for the complex setting. In the revised manuscript we will insert a complete, self-contained derivation of the key inequality (including the handling of the complex modulus and the expectation of the squared norm), thereby securing the convergence result under the listed assumptions. revision: yes

  2. Referee: Extension of directional bias to complex kernel regression (kernel regression section): The claim that 'some directional bias results extend' is stated under the same assumptions, but the bias analysis must account for phase and Hermitian structure; a direct comparison to the real-valued bias equations (e.g., the specific bias term) is needed to confirm the extension is not merely formal.

    Authors: We acknowledge that a direct comparison is necessary to demonstrate that the extension accounts for phase and the Hermitian inner-product structure rather than being merely formal. In the revised version we will expand the directional-bias subsection to include side-by-side statements of the real-valued bias equation and its complex counterpart, explicitly showing how the bias term is modified by the sesquilinear form and the phase factor. This will make the substantive nature of the extension clear. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence and bias extensions derived from parallel assumptions without self-referential reduction.

full rationale

The paper introduces complex SGD as a variant allowing complex parameters, then states convergence guarantees under assumptions (convexity, smoothness, bounded variance) that explicitly parallel the real-valued case, without analyticity constraints. Directional bias results for kernel regression in complex RKHS are presented as extensions confirmed under the same assumptions. No equations or claims reduce a target quantity to its own definition or fitted inputs by construction. No load-bearing self-citation chain is invoked to justify uniqueness or the core proofs; the work treats the real-setting results as external benchmarks and transplants the assumption structure directly. This is a standard honest extension rather than a renaming or ansatz smuggling. The derivation chain remains self-contained against external real-valued SGD literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; the paper relies on standard real-valued SGD assumptions adapted to the complex domain. No free parameters or new entities are mentioned.

axioms (1)
  • domain assumption Assumptions parallel to those used for real-valued SGD (convexity, smoothness, bounded variance) hold and suffice for complex parameters and gradients
    Abstract states convergence guarantees are provided 'under assumptions that parallel those from the real setting'.

pith-pipeline@v0.9.0 · 5505 in / 1464 out tokens · 66296 ms · 2026-05-08T12:10:09.206864+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 5 canonical work pages

  1. [1]

    Wirtinger calculus based gradient descent and levenberg-marquardt learning algorithms in complex-valued neural networks,

    M. F. Amin, M. I. Amin, A. Y. H. Al-Nuaimi, and K. Murase, “Wirtinger calculus based gradient descent and levenberg-marquardt learning algorithms in complex-valued neural networks,” in Neural Information Processing, B.-L. Lu, L. Zhang, and J. Kwok, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 550–559

  2. [2]

    A complex gradient operator and its application in adaptive array theory ,

    D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory ,”IEE Proceedings F: Communications, Radar and Signal Processing, vol. 130, no. 1, pp. 11–16, 1983

  3. [3]

    The complex gradient operator and the CR- calculus,

    K. Kreutz-Delgado, “The complex gradient operator and the CR-calculus,” arXiv preprint arXiv:0906.4835, 2009

  4. [4]

    The complex LMS algorithm,

    B. Widrow, J. McCool, and M. Ball, “The complex LMS algorithm,”Proceedings of the IEEE, vol. 63, no. 4, pp. 719–720, 1975

  5. [5]

    Complex gradient and hessian,

    A. van den Bos, “Complex gradient and hessian,” IEE Proceedings – Vision, Image and Signal Processing, vol. 141, no. 6, pp. 380–382, 1994

  6. [6]

    Complex-valued matrix differentiation: Techniques and key results,

    A. Hjørungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,” IEEE Transactions on Signal Processing, vol. 55, no. 6, pp. 2740–2746, 2007

  7. [7]

    A short tutorial on Wirtinger Calculus with applications in quantum information,

    K. Koor, Y. Qiu, L. C. Kwek, and P. Rebentrost, “A short tutorial on wirtinger calculus with applications in quantum information,” arXiv preprint arXiv:2312.04858, 2023

  8. [8]

    Convergence analysis of an augmented algorithm for fully complex-valued neural networks,

    D. Xu, H. Zhang, and D. P. Mandic, “Convergence analysis of an augmented algorithm for fully complex-valued neural networks,” Neural Networks, vol. 69, pp. 44–50, 2015

  9. [9]

    Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus,

    H. Zhang, X. Liu, D. Xu, and Y. Zhang, “Convergence analysis of fully complex backpropagation algorithm based on wirtinger calculus,”Cognitive Neurodynamics, vol. 8, no. 3, pp. 261–266, 2014

  10. [10]

    Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate,

    J. Wu, D. Zou, V. Braverman, and Q. Gu, “Direction matters: On the implicit bias of stochastic gradient descent with moderate learning rate,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=3X64RLgzY6O

  11. [11]

    The directional bias helps stochastic gradient descent to generalize in kernel regression models,

    Y. Luo, X. Huo, and Y. Mei, “The directional bias helps stochastic gradient descent to generalize in kernel regression models,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 678–683

  12. [12]

    Some results on Tchebycheffian spline functions,

    G. S. Kimeldorf and G. Wahba, “Some results on Tchebycheffian spline functions,” Journal of Mathematical Analysis and Applications, vol. 33, no. 1, pp. 82–95, 1971

  13. [13]

    Representer theorem in complex reproducing kernel hilbert spaces with applications to fock and hardy spaces and superoscillations,

    N. Alpay , A. De Martino, and K. Diki, “Representer theorem in complex reproducing kernel hilbert spaces with applications to fock and hardy spaces and superoscillations,” 2026

  14. [14]

    Superoscillations and physical applications,

    A. N. Jordan, J. C. Howell, N. Vamivakas, and E. Karimi, “Superoscillations and physical applications,” in Operator Theory, D. Alpay , I. Sabadini, and F. Colombo, Eds. Basel: Springer, 2025

  15. [15]

    Superoscillation: from physics to optical applications,

    G. Chen, Z.-Q. Wen, and C.-W. Qiu, “Superoscillation: from physics to optical applications,” Light: Science & Applications, vol. 8, no. 1, p. 56, 2019

  16. [16]

    Multivariable functional interpolation and adaptive networks,

    D. S. Broomhead and D. Lowe, “Multivariable functional interpolation and adaptive networks,” Complex Systems, vol. 2, pp. 321–355, 1988

  17. [18]

    Analogues of finite blaschke products as inner functions,

    C. Felder and T. Le, “Analogues of finite blaschke products as inner functions,” Bull. Lond. Math. Soc., vol. 54, no. 4, pp. 1197– 1219, 2022

  18. [19]

    S. R. Garcia, J. Mashreghi, and W. T. Ross, Finite Blaschke Products and Their Connections. Cham: Springer, 2018, xix+328 pp. ISBN: 978-3-319-78246-1; 978-3-319-78247-8

  19. [20]

    Mashreghi, Blaschke products and their applications

    J. Mashreghi, Blaschke products and their applications. Springer, 2013

  20. [21]

    Handbook of convergence theorems for (stochastic) gradient methods,

    G. Garrigos and R. M. Gower, “Handbook of convergence theorems for (stochastic) gradient methods,” 2024. [Online]. Available: https://arxiv.org/abs/2301.11235

  21. [22]

    Non-asymptotic analysis of stochastic approximation algorithms for machine learning,

    E. Moulines and F. Bach, “Non-asymptotic analysis of stochastic approximation algorithms for machine learning,” Advances in neural information processing systems, vol. 24, 2011

  22. [23]

    Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,

    D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,” Advances in neural information processing systems, vol. 27, 2014

  23. [24]

    An extension of the complex–real (c–r) calculus to the bicomplex setting, with applications,

    D. Alpay , K. Diki, and M. Vajiac, “An extension of the complex–real (c–r) calculus to the bicomplex setting, with applications,” Mathematische Nachrichten, vol. 297, no. 2, pp. 454–481, 2024

  24. [25]

    Phase retrieval via wirtinger flow: Theory and algorithms,

    E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985–2007, 2015

  25. [26]

    A Randomized Kaczmarz Algorithm with Exponential Convergence,

    T. Strohmer and R. Vershynin, “A Randomized Kaczmarz Algorithm with Exponential Convergence,” Journal of Fourier Analysis and Applications, vol. 15, 03 2007

  26. [27]

    Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,

    D. Needell, N. Srebro, and R. Ward, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,” Mathematical Programming, vol. 155, no. 1, pp. 549–573, Jan 2016. [Online]. Available: https://doi.org/10.1007/s10107-015-0864-7

  27. [28]

    Randomized kaczmarz converges along small singular vectors,

    S. Steinerberger, “Randomized kaczmarz converges along small singular vectors,” SIAM Journal on Matrix Analysis and Applications, vol. 42, pp. 608–615, 04 2021

  28. [29]

    Aharonov, F

    Y. Aharonov, F. Colombo, I. Sabadini, D. Struppa, and J. Tollaksen,The mathematics of superoscillations. American Mathematical Society , 2017, vol. 247, no. 1174

  29. [30]

    Fractional supershifts and their associated cauchy evolution problems,

    N. Alpay , “Fractional supershifts and their associated cauchy evolution problems,”arXiv preprint arXiv:2601.11829, 2026

  30. [31]

    Superoscillations and fock spaces,

    D. Alpay , F. Colombo, K. Diki, I. Sabadini, and D. C. Struppa, “Superoscillations and fock spaces,”Journal of Mathematical Physics, vol. 64, no. 9, 2023

  31. [32]

    Schölkopf and A

    B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002. 26

  32. [33]

    Kernel methods in machine learning,

    T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” The Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008

  33. [34]

    C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press, 2006

  34. [35]

    Theory of hp spaces academic press,

    P. L. Duren, “Theory of hp spaces academic press,”New York, 1970

  35. [36]

    New York: Academic Press, 1970

    ——, Theory of H p spaces. New York: Academic Press, 1970

  36. [37]

    Rudin, Real and complex analysis, 3rd ed

    W. Rudin, Real and complex analysis, 3rd ed. New York: McGraw-Hill Book Co., 1987

  37. [38]

    A new characterization of the hardy space and of other hilbert spaces of analytic functions,

    N. Alpay , “A new characterization of the hardy space and of other hilbert spaces of analytic functions,” Istanbul Journal of Mathematics, vol. 1, no. 1, pp. 1–11, 2023

  38. [39]

    Bedrosian identity in blaschke product case,

    P. Cerejeiras, C. Qiuhui, and U. Kaehler, “Bedrosian identity in blaschke product case,”Complex Anal. Oper. Theory, vol. 6, no. 1, pp. 275–300, 2012

  39. [40]

    Saitoh, Theory of Reproducing Kernels and Its Applications, ser

    S. Saitoh, Theory of Reproducing Kernels and Its Applications, ser. Pitman Research Notes in Mathematics Series. Harlow: Longman Scientific & Technical, 1988, vol. 189, co-published with John Wiley & Sons, New York. (NA) DEPARTMENT OFMATHEMATICS, UNIVERSITY OFCALIFORNIA, IRVINE, IRVINE, CA 92697, USA Email address:nalpay@uci.edu (EB) DEPARTMENT OFMATHEMAT...