Recognition: unknown
Arithmetic functions and learning theory
Pith reviewed 2026-05-10 11:42 UTC · model grok-4.3
The pith
The Möbius function restricted to squarefree integers up to R has Fourier ratio at least R to the power minus one quarter, so any uniform learner needs order R samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the lower bound of Pandey and Radziwiłł for the L1 norm of exponential sums with Möbius coefficients, the Fourier ratio satisfies FR(μ_R) ≫ R^{-1/4-ε} for every ε > 0. The class of {-1,1}-valued functions on the squarefree integers with Fourier ratio at least a fixed positive constant therefore has VC-dimension at least cR for an absolute c > 0. It follows that any distribution-independent learning algorithm that succeeds uniformly on the class H_R(η_R) containing μ_R, where η_R tends to zero, requires at least Ω(R) samples. The same method applies to the Liouville function, and a conditional improvement follows from a strong uniform bound on additive twists of the Möbius function.
What carries the argument
The Fourier ratio FR(f) of a function f, which quantifies the spread of its Fourier coefficients and determines the VC-dimension of the class of functions with FR at least a positive constant.
If this is right
- Any distribution-independent learner for the class H_R(η_R) requires Ω(R) samples.
- The identical sample-complexity lower bound holds for the Liouville function by the same argument.
- A stronger lower bound on the Fourier ratio, and thus on sample complexity, follows conditionally from a uniform bound on additive twists of the Möbius function.
Where Pith is reading between the lines
- Bounds on exponential sums for other multiplicative functions may yield analogous Fourier-ratio lower bounds and learning hardness results.
- The method provides a template for turning arithmetic oscillation estimates into VC-dimension statements for restricted function classes on intervals of length R.
Load-bearing premise
The proof depends on the lower bound for the L1 norm of exponential sums with Möbius coefficients from Pandey and Radziwiłł holding at the stated strength.
What would settle it
An algorithm that learns the class H_R(η_R) with o(R) samples, or a proof that FR(μ_R) is o(R^{-1/4}), would falsify the central claim.
Figures
read the original abstract
We establish a connection between analytic number theory and computational learning theory by showing that the M\"obius function belongs to a class of functions that is statistically hard to learn from random samples. Let $\mu_R$ denote the restriction of the M\"obius function to the squarefree integers in $\{1,\dots,R\}$. Using a recent lower bound of Pandey and Radziwi{\l}{\l} for the $L^1$ norm of exponential sums with M\"obius coefficients, we prove that \[ \FR(\mu_R) \gg R^{-1/4-\epsilon} \] for every $\epsilon>0$. We then show that, for a suitable absolute constant $c_0>0$, the class of $\{-1,1\}$-valued functions on the squarefree integers with Fourier Ratio at least $c_0$ has Vapnik--Chervonenkis dimension at least $cR$. It follows that any distribution-independent learning algorithm that succeeds uniformly on the class $\mathcal{H}_R(\eta_R)$ containing $\mu_R$, where $\eta_R \to 0$, requires at least $\Omega(R)$ samples. We also discuss a conditional improvement under a strong uniform bound for additive twists of the M\"obius function, and we note that the same method applies to the Liouville function.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to link analytic number theory and computational learning theory by restricting the Möbius function μ to squarefree integers up to R (denoted μ_R) and proving, via a recent L1 lower bound of Pandey and Radziwiłł on exponential sums with Möbius coefficients, that the Fourier ratio satisfies FR(μ_R) ≫ R^{-1/4-ε} for every ε > 0. It then argues that any class of {-1,1}-valued functions on squarefree integers with Fourier ratio at least a fixed positive constant has VC-dimension at least cR for an absolute c > 0; consequently the class H_R(η_R) containing μ_R (with η_R → 0) requires Ω(R) samples for any distribution-independent learning algorithm that succeeds uniformly. The same method is said to apply to the Liouville function, and a conditional improvement is discussed under a strong uniform bound on additive twists of μ.
Significance. If the quantitative translation from the imported L1 bound to the Fourier-ratio lower bound holds with the stated exponent, the work supplies a concrete arithmetic example of a statistically hard-to-learn class whose sample complexity is linear in the domain size. The explicit dependence on an external result is transparent, the extension to the Liouville function is noted, and the conditional improvement is clearly separated. These features give the paper modest but genuine interdisciplinary interest; the primary novelty lies in the learning-theoretic interpretation rather than in new number-theoretic estimates.
major comments (2)
- [Proof of the Fourier-ratio lower bound (immediately after the statement of the Pandey–Radziwiłł theorem)] In the derivation that FR(μ_R) ≫ R^{-1/4-ε} (the step immediately following the citation of the Pandey–Radziwiłł L1 lower bound), confirm that the passage from ||∑_{n≤x} μ(n) e(α n)||_1 to a lower bound on the Fourier ratio preserves the exponent -1/4-ε without an extra logarithmic loss. Provide the precise inequality or lemma that converts the L1 norm into the ratio FR(μ_R) = max_α |∑ μ(n) e(α n)| / ||μ_R||_2 or its equivalent definition used in the paper.
- [VC-dimension lower bound for H_R(η_R)] In the argument that Fourier ratio ≥ c_0 implies VC-dimension ≥ c R for the class of {-1,1}-valued functions on squarefree integers (the step leading to the Ω(R) sample-complexity claim), verify that the constant c_0 can be chosen independently of R and that the class H_R(η_R) containing μ_R satisfies the uniform-success condition with η_R → 0. State explicitly the relation between the Fourier-ratio threshold and the VC-dimension lower bound.
minor comments (3)
- [Introduction / Notation] Define the Fourier ratio FR(f) with an equation number at the first appearance and state its precise normalization (e.g., whether the denominator is the L2 norm on the squarefree support or another quantity).
- [References] Supply the full bibliographic details (arXiv number or journal reference) for the Pandey–Radziwiłł result in the bibliography.
- [Conditional improvement paragraph] In the discussion of the conditional improvement, clarify whether the stronger uniform bound on additive twists is assumed only for the Möbius function or also for the Liouville function.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for isolating the two technical steps that benefit from explicit clarification. We address both major comments below and will revise the manuscript to incorporate the requested details and explicit statements.
read point-by-point responses
-
Referee: [Proof of the Fourier-ratio lower bound (immediately after the statement of the Pandey–Radziwiłł theorem)] In the derivation that FR(μ_R) ≫ R^{-1/4-ε} (the step immediately following the citation of the Pandey–Radziwiłł L1 lower bound), confirm that the passage from ||∑_{n≤x} μ(n) e(α n)||_1 to a lower bound on the Fourier ratio preserves the exponent -1/4-ε without an extra logarithmic loss. Provide the precise inequality or lemma that converts the L1 norm into the ratio FR(μ_R) = max_α |∑ μ(n) e(α n)| / ||μ_R||_2 or its equivalent definition used in the paper.
Authors: The conversion is immediate and introduces no logarithmic loss. Let S(α) = ∑_{n≤R squarefree} μ(n) e(α n). The cited Pandey–Radziwiłł theorem yields ∫_0^1 |S(α)| dα ≫ R^{1/4-ε} (with the squarefree restriction absorbed into the implied constant). The elementary inequality max_α |S(α)| ≥ ∫_0^1 |S(α)| dα then gives max_α |S(α)| ≫ R^{1/4-ε}. Because ||μ_R||_2 = Θ(R^{1/2}), we obtain FR(μ_R) = max_α |S(α)| / ||μ_R||_2 ≫ R^{1/4-ε} / R^{1/2} = R^{-1/4-ε}. Any polylogarithmic factors in the L^1 bound are absorbed into the ε. We will insert this short chain of inequalities immediately after the citation of the Pandey–Radziwiłł theorem. revision: yes
-
Referee: [VC-dimension lower bound for H_R(η_R)] In the argument that Fourier ratio ≥ c_0 implies VC-dimension ≥ c R for the class of {-1,1}-valued functions on squarefree integers (the step leading to the Ω(R) sample-complexity claim), verify that the constant c_0 can be chosen independently of R and that the class H_R(η_R) containing μ_R satisfies the uniform-success condition with η_R → 0. State explicitly the relation between the Fourier-ratio threshold and the VC-dimension lower bound.
Authors: The constant c_0 > 0 is absolute and chosen independently of R; it is fixed once and for all so that every function with Fourier ratio at least c_0 has a uniformly positive correlation with a suitable character, which is the key ingredient in the shattering argument that yields VC-dimension at least c R with c > 0 absolute. The class H_R(η_R) consists of all {-1,1}-valued functions on the squarefree integers up to R whose Fourier ratio is at least η_R, where we set η_R ≍ R^{-1/4-ε} so that η_R → 0 while still ensuring μ_R ∈ H_R(η_R). The VC lower bound is proved for the subclass of functions with Fourier ratio at least the fixed c_0. For all sufficiently large R we have η_R < c_0, whence this subclass is contained in H_R(η_R). The VC-dimension of H_R(η_R) is therefore at least as large as that of the subclass, i.e., at least c R. Any distribution-independent learner that succeeds uniformly on the whole class H_R(η_R) must in particular succeed on the subclass, so the Ω(R) sample-complexity lower bound carries over directly. We will add a short paragraph making the inclusion and the inheritance of the VC lower bound explicit. revision: yes
Circularity Check
No circularity: central lower bound imported from independent external source
full rationale
The paper's derivation imports the key L1 lower bound on Möbius exponential sums directly from the independent Pandey-Radziwiłł result (not a self-citation). It then applies this external fact to the paper's own definition of the Fourier ratio FR(μ_R) to obtain the stated lower bound, and feeds that into a standard VC-dimension argument for the class H_R(η_R). No step reduces a claimed prediction or uniqueness result back to a fitted parameter, self-definition, or author-overlapping citation chain inside the present work. The sample-complexity conclusion follows mechanically from the VC lower bound once the external input is granted, satisfying the self-contained criterion against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pandey-Radziwiłł lower bound on the L1 norm of exponential sums with Möbius coefficients
Reference graph
Works this paper leans on
-
[1]
K. Aldahleh, W. Burstein, G. Garza, A. Iosevich, J. Iosevich, A. Khalil, J. King, T. Le, I. Li, A. Mayeli, K. Nguyen, and N. Shaffer,The Fourier ratio and complexity of signals. (arXiv:2511.19560), (2025). 2
-
[2]
Apostol,Introduction to Analytic Number Theory, Springer-Verlag, New York, (1976)
T. Apostol,Introduction to Analytic Number Theory, Springer-Verlag, New York, (1976). 5
1976
-
[3]
R. C. Baker and G. Harman,Exponential sums formed with the M¨ obius function, J. London Math. Soc. (2) 43 (1991), no. 2, 193–198. 3, 6, 13
1991
-
[4]
Balog and I
A. Balog and I. Z. Ruzsa,On the exponential sum overr-free integers, Acta Math. Hungar. 90 (2001), 219–230. 5
2001
-
[5]
Bourgain,On the correlation of the M¨ obius function with rank-one systems, Journal d’Analyse Math´ ematique 120 (2013), 105–130
J. Bourgain,On the correlation of the M¨ obius function with rank-one systems, Journal d’Analyse Math´ ematique 120 (2013), 105–130. 4
2013
-
[6]
Davenport,Multiplicative Number Theory, 2nd ed., Springer-Verlag, New York, (1980)
H. Davenport,Multiplicative Number Theory, 2nd ed., Springer-Verlag, New York, (1980). 2
1980
-
[7]
Deodhar and A
S. Deodhar and A. Iosevich,Spectral synthesis with the complexity parameter, (preprint), (2026). 2
2026
-
[8]
Ehrenfeucht, D
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant,A general lower bound on the number of examples needed for learning, Information and Computation, 82(3):247–261, (1989). 9
1989
-
[9]
Green,The M¨ obius function is strongly orthogonal to nilsequences, Annals of Mathematics 175 (2012), 541–566
B. Green,The M¨ obius function is strongly orthogonal to nilsequences, Annals of Mathematics 175 (2012), 541–566. 4
2012
-
[10]
Haagerup,The best constants in the Khintchine inequality, Studia Mathematica, 70(3):231–283, (1981)
U. Haagerup,The best constants in the Khintchine inequality, Studia Mathematica, 70(3):231–283, (1981)
1981
-
[11]
A. Iosevich, Z. Li, E. Palsson, and A. Yavicoli,The Fourier Ratio: Uncertainty, Restriction, and Approximation for Compactly Supported Measures, arXiv:2512.16751, (2025). 2
-
[12]
A. Iosevich, A. Mayeli, and E. Wyman,Spectral synthesis on Riemannian manifolds, preprint, arXiv:2603.21451 (2026). 2
-
[13]
S. V. Konyagin,On the Littlewood problem, Izv. Akad. Nauk SSSR Ser. Mat. 45 (1981), 243–265. 2
1981
-
[14]
H. L. Montgomery, H. L. Pigno, and K. Smith,A generalization of a theorem of Littlewood, Proc. Amer. Math. Soc. 83 (1981), 341–345. 2
1981
-
[15]
Pandey and M
M. Pandey and M. Radziwi l l,Means of exponential sums with multiplicative coefficients. I., Compositio Mathematica, (to appear). 3, 5, 6, 13, 14
-
[16]
M. Pandey and M. Radziwi l l,Means of exponential sums with multiplicative coefficients. II., arXiv:2510.20194. 2
-
[17]
B. Rodgers, J. Bennatar, and N. Nishry,Gaussian distribution for short sums of multiplicative functions, arXiv:2012.15507. 10
-
[18]
Sarnak,Three lectures on the M¨ obius function, randomness and dynamics, Lecture notes, IAS, 2010
P. Sarnak,Three lectures on the M¨ obius function, randomness and dynamics, Lecture notes, IAS, 2010. 2
2010
-
[19]
Tao,The Chowla conjecture and the Sarnak conjecture, blog post, 2012
T. Tao,The Chowla conjecture and the Sarnak conjecture, blog post, 2012. 2
2012
-
[20]
V. N. Vapnik and A. Ya. Chervonenkis,On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, 16(2):264–280, (1971). 9 Department of Mathematics, University of Rochester, Rochester, NY, USA Email address:willburst88@gmail.com Department of Mathematics, University of Rochester, Rochest...
1971
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.