arxiv: 2604.14482 · v1 · submitted 2026-04-15 · 🧮 math.NT · math.CA· math.ST· stat.TH

Recognition: unknown

Arithmetic functions and learning theory

W. Burstein , A. Iosevich , A. Sant

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:42 UTC · model grok-4.3

classification 🧮 math.NT math.CAmath.STstat.TH

keywords Möbius functionFourier ratioVC-dimensionlearning theoryexponential sumssquarefree integerssample complexity

0 comments

The pith

The Möbius function restricted to squarefree integers up to R has Fourier ratio at least R to the power minus one quarter, so any uniform learner needs order R samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a direct link between analytic number theory and computational learning theory by proving that the Möbius function is statistically hard to learn from random samples. It uses a known lower bound on the L1 norm of exponential sums with Möbius coefficients to show that the Fourier ratio of the restricted function is bounded below by roughly R to the power of negative one quarter. This lower bound implies that the relevant class of functions has Vapnik-Chervonenkis dimension linear in R. As a result, distribution-independent learning algorithms that work uniformly on the class require linearly many samples in R. The connection matters because it translates oscillatory behavior captured by exponential sums into concrete sample-complexity lower bounds for arithmetic functions.

Core claim

Using the lower bound of Pandey and Radziwiłł for the L1 norm of exponential sums with Möbius coefficients, the Fourier ratio satisfies FR(μ_R) ≫ R^{-1/4-ε} for every ε > 0. The class of {-1,1}-valued functions on the squarefree integers with Fourier ratio at least a fixed positive constant therefore has VC-dimension at least cR for an absolute c > 0. It follows that any distribution-independent learning algorithm that succeeds uniformly on the class H_R(η_R) containing μ_R, where η_R tends to zero, requires at least Ω(R) samples. The same method applies to the Liouville function, and a conditional improvement follows from a strong uniform bound on additive twists of the Möbius function.

What carries the argument

The Fourier ratio FR(f) of a function f, which quantifies the spread of its Fourier coefficients and determines the VC-dimension of the class of functions with FR at least a positive constant.

If this is right

Any distribution-independent learner for the class H_R(η_R) requires Ω(R) samples.
The identical sample-complexity lower bound holds for the Liouville function by the same argument.
A stronger lower bound on the Fourier ratio, and thus on sample complexity, follows conditionally from a uniform bound on additive twists of the Möbius function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Bounds on exponential sums for other multiplicative functions may yield analogous Fourier-ratio lower bounds and learning hardness results.
The method provides a template for turning arithmetic oscillation estimates into VC-dimension statements for restricted function classes on intervals of length R.

Load-bearing premise

The proof depends on the lower bound for the L1 norm of exponential sums with Möbius coefficients from Pandey and Radziwiłł holding at the stated strength.

What would settle it

An algorithm that learns the class H_R(η_R) with o(R) samples, or a proof that FR(μ_R) is o(R^{-1/4}), would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.14482 by A. Iosevich, A. Sant, W. Burstein.

**Figure 1.** Figure 1: The Fourier Ratio R(R) for the M¨obius function as a function of R on a logarithmic scale. The dashed line marks the benchmark p 2/π ≈ 0.7979. Caveats. The numerical experiments are limited to R ≤ 107 , which is very small by analytic number theory standards. Phenomena such as the Littlewood oscillation for π(x) (where π(x) − li(x) changes sign infinitely often, with the first sign change occurring beyond … view at source ↗

**Figure 2.** Figure 2: The deviation |R(R)− p 2/π| on a log–log scale for R ≥ 103 . The dashed line is a reference slope included only as a visual guide. 5.3. Comparison with other arithmetic functions. We compare the M¨obius function with several related examples at R = 106 . For functions with nonzero average, we consider centered versions to reflect oscillation rather than overall bias. For each real-valued function f on {1, … view at source ↗

**Figure 3.** Figure 3: The Fourier Ratio for five arithmetic functions at R = 106 . The dashed line marks the benchmark p 2/π. 6. Open Problems The results of this paper raise several questions that we believe merit further investigation. Problem 6.1 (Improve the L 1 lower bound). Pandey and Radziwi l l [15] proved that ∥FR∥L1 ≫ϵ R1/4−ϵ . Is the true order of growth R1/2 ? This would be consistent with the Gaussian heuristic an… view at source ↗

read the original abstract

We establish a connection between analytic number theory and computational learning theory by showing that the M\"obius function belongs to a class of functions that is statistically hard to learn from random samples. Let $\mu_R$ denote the restriction of the M\"obius function to the squarefree integers in $\{1,\dots,R\}$. Using a recent lower bound of Pandey and Radziwi{\l}{\l} for the $L^1$ norm of exponential sums with M\"obius coefficients, we prove that \[ \FR(\mu_R) \gg R^{-1/4-\epsilon} \] for every $\epsilon>0$. We then show that, for a suitable absolute constant $c_0>0$, the class of $\{-1,1\}$-valued functions on the squarefree integers with Fourier Ratio at least $c_0$ has Vapnik--Chervonenkis dimension at least $cR$. It follows that any distribution-independent learning algorithm that succeeds uniformly on the class $\mathcal{H}_R(\eta_R)$ containing $\mu_R$, where $\eta_R \to 0$, requires at least $\Omega(R)$ samples. We also discuss a conditional improvement under a strong uniform bound for additive twists of the M\"obius function, and we note that the same method applies to the Liouville function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper derives an Omega(R) sample lower bound for learning the restricted Mobius function by translating a recent L1 exponential sum bound into a Fourier ratio lower bound and then a VC-dimension claim.

read the letter

The main point is that this paper shows the Mobius function, when restricted to squarefrees up to R, requires linearly many samples to learn in a distribution-independent way. They get this by converting a recent L1 exponential sum lower bound into a Fourier ratio lower bound of R^{-1/4-eps}, then using that to bound the VC dimension from below by cR for the class containing the restricted Mobius function. It follows that uniform learning needs Omega(R) samples, and the same method applies to the Liouville function with a note on a conditional improvement under a stronger twist bound.

Referee Report

2 major / 3 minor

Summary. The manuscript claims to link analytic number theory and computational learning theory by restricting the Möbius function μ to squarefree integers up to R (denoted μ_R) and proving, via a recent L1 lower bound of Pandey and Radziwiłł on exponential sums with Möbius coefficients, that the Fourier ratio satisfies FR(μ_R) ≫ R^{-1/4-ε} for every ε > 0. It then argues that any class of {-1,1}-valued functions on squarefree integers with Fourier ratio at least a fixed positive constant has VC-dimension at least cR for an absolute c > 0; consequently the class H_R(η_R) containing μ_R (with η_R → 0) requires Ω(R) samples for any distribution-independent learning algorithm that succeeds uniformly. The same method is said to apply to the Liouville function, and a conditional improvement is discussed under a strong uniform bound on additive twists of μ.

Significance. If the quantitative translation from the imported L1 bound to the Fourier-ratio lower bound holds with the stated exponent, the work supplies a concrete arithmetic example of a statistically hard-to-learn class whose sample complexity is linear in the domain size. The explicit dependence on an external result is transparent, the extension to the Liouville function is noted, and the conditional improvement is clearly separated. These features give the paper modest but genuine interdisciplinary interest; the primary novelty lies in the learning-theoretic interpretation rather than in new number-theoretic estimates.

major comments (2)

[Proof of the Fourier-ratio lower bound (immediately after the statement of the Pandey–Radziwiłł theorem)] In the derivation that FR(μ_R) ≫ R^{-1/4-ε} (the step immediately following the citation of the Pandey–Radziwiłł L1 lower bound), confirm that the passage from ||∑_{n≤x} μ(n) e(α n)||_1 to a lower bound on the Fourier ratio preserves the exponent -1/4-ε without an extra logarithmic loss. Provide the precise inequality or lemma that converts the L1 norm into the ratio FR(μ_R) = max_α |∑ μ(n) e(α n)| / ||μ_R||_2 or its equivalent definition used in the paper.
[VC-dimension lower bound for H_R(η_R)] In the argument that Fourier ratio ≥ c_0 implies VC-dimension ≥ c R for the class of {-1,1}-valued functions on squarefree integers (the step leading to the Ω(R) sample-complexity claim), verify that the constant c_0 can be chosen independently of R and that the class H_R(η_R) containing μ_R satisfies the uniform-success condition with η_R → 0. State explicitly the relation between the Fourier-ratio threshold and the VC-dimension lower bound.

minor comments (3)

[Introduction / Notation] Define the Fourier ratio FR(f) with an equation number at the first appearance and state its precise normalization (e.g., whether the denominator is the L2 norm on the squarefree support or another quantity).
[References] Supply the full bibliographic details (arXiv number or journal reference) for the Pandey–Radziwiłł result in the bibliography.
[Conditional improvement paragraph] In the discussion of the conditional improvement, clarify whether the stronger uniform bound on additive twists is assumed only for the Möbius function or also for the Liouville function.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for isolating the two technical steps that benefit from explicit clarification. We address both major comments below and will revise the manuscript to incorporate the requested details and explicit statements.

read point-by-point responses

Referee: [Proof of the Fourier-ratio lower bound (immediately after the statement of the Pandey–Radziwiłł theorem)] In the derivation that FR(μ_R) ≫ R^{-1/4-ε} (the step immediately following the citation of the Pandey–Radziwiłł L1 lower bound), confirm that the passage from ||∑_{n≤x} μ(n) e(α n)||_1 to a lower bound on the Fourier ratio preserves the exponent -1/4-ε without an extra logarithmic loss. Provide the precise inequality or lemma that converts the L1 norm into the ratio FR(μ_R) = max_α |∑ μ(n) e(α n)| / ||μ_R||_2 or its equivalent definition used in the paper.

Authors: The conversion is immediate and introduces no logarithmic loss. Let S(α) = ∑_{n≤R squarefree} μ(n) e(α n). The cited Pandey–Radziwiłł theorem yields ∫_0^1 |S(α)| dα ≫ R^{1/4-ε} (with the squarefree restriction absorbed into the implied constant). The elementary inequality max_α |S(α)| ≥ ∫_0^1 |S(α)| dα then gives max_α |S(α)| ≫ R^{1/4-ε}. Because ||μ_R||_2 = Θ(R^{1/2}), we obtain FR(μ_R) = max_α |S(α)| / ||μ_R||_2 ≫ R^{1/4-ε} / R^{1/2} = R^{-1/4-ε}. Any polylogarithmic factors in the L^1 bound are absorbed into the ε. We will insert this short chain of inequalities immediately after the citation of the Pandey–Radziwiłł theorem. revision: yes
Referee: [VC-dimension lower bound for H_R(η_R)] In the argument that Fourier ratio ≥ c_0 implies VC-dimension ≥ c R for the class of {-1,1}-valued functions on squarefree integers (the step leading to the Ω(R) sample-complexity claim), verify that the constant c_0 can be chosen independently of R and that the class H_R(η_R) containing μ_R satisfies the uniform-success condition with η_R → 0. State explicitly the relation between the Fourier-ratio threshold and the VC-dimension lower bound.

Authors: The constant c_0 > 0 is absolute and chosen independently of R; it is fixed once and for all so that every function with Fourier ratio at least c_0 has a uniformly positive correlation with a suitable character, which is the key ingredient in the shattering argument that yields VC-dimension at least c R with c > 0 absolute. The class H_R(η_R) consists of all {-1,1}-valued functions on the squarefree integers up to R whose Fourier ratio is at least η_R, where we set η_R ≍ R^{-1/4-ε} so that η_R → 0 while still ensuring μ_R ∈ H_R(η_R). The VC lower bound is proved for the subclass of functions with Fourier ratio at least the fixed c_0. For all sufficiently large R we have η_R < c_0, whence this subclass is contained in H_R(η_R). The VC-dimension of H_R(η_R) is therefore at least as large as that of the subclass, i.e., at least c R. Any distribution-independent learner that succeeds uniformly on the whole class H_R(η_R) must in particular succeed on the subclass, so the Ω(R) sample-complexity lower bound carries over directly. We will add a short paragraph making the inclusion and the inheritance of the VC lower bound explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: central lower bound imported from independent external source

full rationale

The paper's derivation imports the key L1 lower bound on Möbius exponential sums directly from the independent Pandey-Radziwiłł result (not a self-citation). It then applies this external fact to the paper's own definition of the Fourier ratio FR(μ_R) to obtain the stated lower bound, and feeds that into a standard VC-dimension argument for the class H_R(η_R). No step reduces a claimed prediction or uniqueness result back to a fitted parameter, self-definition, or author-overlapping citation chain inside the present work. The sample-complexity conclusion follows mechanically from the VC lower bound once the external input is granted, satisfying the self-contained criterion against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on one external analytic lower bound and on standard definitions from learning theory; no free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption Pandey-Radziwiłł lower bound on the L1 norm of exponential sums with Möbius coefficients
Invoked explicitly to obtain the Fourier-ratio lower bound for μ_R.

pith-pipeline@v0.9.0 · 5545 in / 1463 out tokens · 28733 ms · 2026-05-10T11:42:42.649100+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 5 canonical work pages

[1]

Aldahleh, W

K. Aldahleh, W. Burstein, G. Garza, A. Iosevich, J. Iosevich, A. Khalil, J. King, T. Le, I. Li, A. Mayeli, K. Nguyen, and N. Shaffer,The Fourier ratio and complexity of signals. (arXiv:2511.19560), (2025). 2

work page arXiv 2025
[2]

Apostol,Introduction to Analytic Number Theory, Springer-Verlag, New York, (1976)

T. Apostol,Introduction to Analytic Number Theory, Springer-Verlag, New York, (1976). 5

1976
[3]

R. C. Baker and G. Harman,Exponential sums formed with the M¨ obius function, J. London Math. Soc. (2) 43 (1991), no. 2, 193–198. 3, 6, 13

1991
[4]

Balog and I

A. Balog and I. Z. Ruzsa,On the exponential sum overr-free integers, Acta Math. Hungar. 90 (2001), 219–230. 5

2001
[5]

Bourgain,On the correlation of the M¨ obius function with rank-one systems, Journal d’Analyse Math´ ematique 120 (2013), 105–130

J. Bourgain,On the correlation of the M¨ obius function with rank-one systems, Journal d’Analyse Math´ ematique 120 (2013), 105–130. 4

2013
[6]

Davenport,Multiplicative Number Theory, 2nd ed., Springer-Verlag, New York, (1980)

H. Davenport,Multiplicative Number Theory, 2nd ed., Springer-Verlag, New York, (1980). 2

1980
[7]

Deodhar and A

S. Deodhar and A. Iosevich,Spectral synthesis with the complexity parameter, (preprint), (2026). 2

2026
[8]

Ehrenfeucht, D

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant,A general lower bound on the number of examples needed for learning, Information and Computation, 82(3):247–261, (1989). 9

1989
[9]

Green,The M¨ obius function is strongly orthogonal to nilsequences, Annals of Mathematics 175 (2012), 541–566

B. Green,The M¨ obius function is strongly orthogonal to nilsequences, Annals of Mathematics 175 (2012), 541–566. 4

2012
[10]

Haagerup,The best constants in the Khintchine inequality, Studia Mathematica, 70(3):231–283, (1981)

U. Haagerup,The best constants in the Khintchine inequality, Studia Mathematica, 70(3):231–283, (1981)

1981
[11]

Iosevich, Z

A. Iosevich, Z. Li, E. Palsson, and A. Yavicoli,The Fourier Ratio: Uncertainty, Restriction, and Approximation for Compactly Supported Measures, arXiv:2512.16751, (2025). 2

work page arXiv 2025
[12]

Iosevich, A

A. Iosevich, A. Mayeli, and E. Wyman,Spectral synthesis on Riemannian manifolds, preprint, arXiv:2603.21451 (2026). 2

work page arXiv 2026
[13]

S. V. Konyagin,On the Littlewood problem, Izv. Akad. Nauk SSSR Ser. Mat. 45 (1981), 243–265. 2

1981
[14]

H. L. Montgomery, H. L. Pigno, and K. Smith,A generalization of a theorem of Littlewood, Proc. Amer. Math. Soc. 83 (1981), 341–345. 2

1981
[15]

Pandey and M

M. Pandey and M. Radziwi l l,Means of exponential sums with multiplicative coefficients. I., Compositio Mathematica, (to appear). 3, 5, 6, 13, 14
[16]

Pandey and M

M. Pandey and M. Radziwi l l,Means of exponential sums with multiplicative coefficients. II., arXiv:2510.20194. 2

work page arXiv
[17]

Rodgers, J

B. Rodgers, J. Bennatar, and N. Nishry,Gaussian distribution for short sums of multiplicative functions, arXiv:2012.15507. 10

work page arXiv 2012
[18]

Sarnak,Three lectures on the M¨ obius function, randomness and dynamics, Lecture notes, IAS, 2010

P. Sarnak,Three lectures on the M¨ obius function, randomness and dynamics, Lecture notes, IAS, 2010. 2

2010
[19]

Tao,The Chowla conjecture and the Sarnak conjecture, blog post, 2012

T. Tao,The Chowla conjecture and the Sarnak conjecture, blog post, 2012. 2

2012
[20]

V. N. Vapnik and A. Ya. Chervonenkis,On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, 16(2):264–280, (1971). 9 Department of Mathematics, University of Rochester, Rochester, NY, USA Email address:willburst88@gmail.com Department of Mathematics, University of Rochester, Rochest...

1971