The Fast Mixing Mechanism for Differential Privacy

Ashia C. Wilson; Katrina Ligett; Moshe Shenfeld; Omri Lev; Vishwak Srinivasan

arxiv: 2605.30600 · v1 · pith:PPV2GFBWnew · submitted 2026-05-28 · 💻 cs.LG · cs.IT· math.IT

The Fast Mixing Mechanism for Differential Privacy

Omri Lev , Moshe Shenfeld , Vishwak Srinivasan , Katrina Ligett , Ashia C. Wilson This is my paper

Pith reviewed 2026-06-29 08:11 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords differential privacysketchingfast transformslinear regressionordinary least squaresprivacy guaranteescomputational efficiency

0 comments

The pith

A fast-transform DP sketching mechanism matches Gaussian privacy guarantees up to constants while running at classical fast-sketch speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a differential privacy sketching mechanism constructed from structured fast transforms such as the Hadamard matrix. It proves that this mechanism delivers privacy guarantees that, in favorable regimes, match the Gaussian sketch up to a constant factor. The same mechanism is then combined with existing sketch-based DP linear regression methods to produce an algorithm for differentially private ordinary least squares that improves runtime while retaining strong utility. A reader would care because prior DP sketching approaches using Gaussian matrices incurred high computational cost that this construction avoids.

Core claim

The fast mixing mechanism is a DP sketching method built on fast transforms that, in certain regimes, matches both the runtime of classical fast sketching and the privacy guarantees of the Gaussian sketch up to a constant factor; when plugged into recent sketch-based DP linear regression pipelines it yields the first fast algorithm for DP ordinary least squares with established privacy and accuracy bounds.

What carries the argument

The fast mixing mechanism: a differential privacy sketching construction that replaces Gaussian matrices with structured fast transforms while preserving the required concentration and privacy properties.

If this is right

The mechanism achieves state-of-the-art privacy guarantees that match Gaussian sketches up to a constant factor in favorable regimes.
Runtime matches that of classical fast sketching methods in the cases where the transforms can be applied efficiently.
The construction produces the first fast algorithm for differentially private ordinary least squares when combined with existing sketch-based DP regression methods.
Strong utility guarantees are retained alongside the improved runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fast-transform construction could be tested on other linear-algebra primitives such as ridge regression or PCA to see whether the constant-factor privacy matching extends beyond ordinary least squares.
If the concentration properties hold only for specific matrix dimensions, the method may require regime-specific tuning that the current analysis does not yet quantify.
Structured fast transforms might replace Gaussian matrices in other DP mechanisms that currently rely on random projections, provided the privacy analysis can be carried through.

Load-bearing premise

The fast transforms preserve the necessary concentration and privacy properties required for the utility and privacy analyses to carry over from Gaussian sketches without additional error terms or regime restrictions beyond those stated.

What would settle it

A side-by-side numerical check, on the same matrix dimensions and noise level, of the empirical privacy loss (measured via the tightest epsilon-delta pair that the mechanism satisfies) for the fast-transform sketch versus the Gaussian sketch; the claim fails if the fast-transform loss exceeds the Gaussian loss by more than a small constant factor in the stated favorable regimes.

Figures

Figures reproduced from arXiv: 2605.30600 by Ashia C. Wilson, Katrina Ligett, Moshe Shenfeld, Omri Lev, Vishwak Srinivasan.

**Figure 1.** Figure 1: The additive noise scale σ required to attain target ε for Lemma 1 compared with Upadhyay [2014] and GaussMix [Lev et al., 2025, Lemma. 1] for n ∈ {104 , 105}. We set λ X min = n 8dC 2 X and λ SfX min = 5 6 λ X min. We plot the noise required by FastMix with C 2 SfX = 2C 2 X, and in the extreme case C 2 SfX → ∞. The curves use the exact characterization instead of (3) (see Appendix C for details). Proposit… view at source ↗

**Figure 2.** Figure 2: Performance comparison of FastIHM and IHM. The upper plots are for n d ≈ 1450 and the lower for n d ≈ 16000. For each dataset, we use three values of k2, represented by r1, r2, and r3. The plots illustrate the dependence on λ X max and λ X min, where smaller λ X max and larger λ X min improve performance. Moreover, in certain cases, FastIHM matches the performance of IHM. Dataset (r1, r2, r3) Fast IHS Fast… view at source ↗

**Figure 3.** Figure 3: The ratio between the runtime of IHM and each of the different methods (FastIHS and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy-runtime trade-off for the Airline dataset. [PITH_FULL_IMAGE:figures/full_fig_p049_4.png] view at source ↗

read the original abstract

Randomized sketching is a central tool for compressing large-scale optimization problems while preserving accuracy. In particular, sketches that are based on structured matrices, such as the Hadamard matrix, can be applied efficiently and often yield solutions that approximate those of the original problem at much lower computational cost. In differential privacy (DP), Gaussian sketching has been used to solve DP linear regression, beginning with \citet{sheffet2017differentially, sheffet2019old} and later refined by \citet{lev2025gaussianmix, lev2026near}. However, although these methods achieve strong utility guarantees, they usually do not improve runtime over classical DP approaches. In this work, we introduce a new DP sketching mechanism based on fast transforms, which, in certain cases, matches the runtime of classical fast sketching methods. We prove state-of-the-art privacy guarantees for this mechanism and show that, in favorable regimes, they match those of the Gaussian sketch up to a constant factor. As an application, we combine this mechanism with recent sketch-based methods for DP linear regression to obtain a new algorithm with strong utility and improved runtime. We establish privacy and accuracy guarantees for this algorithm, yielding, to the best of our knowledge, the first fast method for DP ordinary least squares.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper introduces fast-transform DP sketch that matches Gaussian privacy up to constants in good regimes and delivers the first fast DP OLS algorithm.

read the letter

The main takeaway is a new DP sketching mechanism built on fast transforms that the authors prove delivers privacy bounds matching the Gaussian sketch up to a constant factor in favorable regimes. They then plug it into recent sketch-based DP linear regression methods to produce the first fast algorithm for differentially private ordinary least squares.

The work does a clean job extending the Gaussian sketching line from Sheffet and Lev et al. The runtime improvement over prior DP sketching is concrete, and the abstract states the privacy and accuracy guarantees explicitly enough to evaluate. The application section shows how the mechanism slots into existing DP OLS pipelines without changing the utility story.

The soft spot is whether the structured fast transforms actually inherit the exact sub-Gaussian tails and rotational invariance that the Gaussian analyses rely on. The stress-test concern is reasonable: JL-type embeddings from Hadamard or similar matrices often need larger sketch sizes or introduce logarithmic factors in failure probability. The paper must re-derive the sensitivity and tail bounds from first principles for the fast case rather than importing Gaussian lemmas directly. If the proofs do that without hidden regime restrictions, the constant-factor claim holds; if they do not, the match slips outside narrow parameter settings.

This is for people working on practical DP optimization and sketching. It is worth a serious referee because the runtime claim is new, the privacy statements are checkable, and the underlying math is presented as independent of fitted parameters.

Referee Report

1 major / 2 minor

Summary. The paper introduces a fast mixing mechanism for differential privacy based on structured fast transforms (e.g., randomized Hadamard), proves state-of-the-art privacy guarantees for the mechanism, shows that these guarantees match those of the Gaussian sketch up to a constant factor in favorable regimes, and applies the mechanism to obtain the first fast algorithm for DP ordinary least squares with matching utility.

Significance. If the central claims hold, the work would be significant for large-scale DP optimization: it combines the runtime benefits of classical fast sketching with DP guarantees that avoid the slowdown of Gaussian methods, while delivering the first fast DP OLS solver. The explicit re-derivation of privacy and utility bounds from the properties of fast transforms (rather than black-box transfer) would be a notable technical contribution.

major comments (1)

[Privacy analysis of the fast mixing mechanism] The privacy analysis for the fast mixing mechanism (main theorem on privacy guarantees): the claim that the mechanism matches Gaussian-sketch privacy up to a constant factor requires an explicit first-principles derivation of the relevant sub-Gaussian tail bounds and sensitivity for the structured fast transform; if the proof invokes rotational invariance or concentration lemmas that hold only for i.i.d. Gaussian entries without re-deriving the failure probabilities or log factors for the fast case, the constant-factor statement does not hold outside narrow parameter regimes.

minor comments (2)

[Abstract and §1] The abstract and introduction repeatedly use the qualifiers 'in certain cases' and 'favorable regimes' without stating the precise conditions on sketch size, dimension, or failure probability; these should be stated explicitly (e.g., as a displayed assumption or theorem statement) so readers can assess applicability.
[Preliminaries] Notation for the fast transform matrix and the mixing parameter should be introduced once with a clear definition before the first use in the privacy proof, to avoid ambiguity when comparing to the Gaussian case.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the importance of a rigorous first-principles privacy analysis. We address the single major comment below.

read point-by-point responses

Referee: [Privacy analysis of the fast mixing mechanism] The privacy analysis for the fast mixing mechanism (main theorem on privacy guarantees): the claim that the mechanism matches Gaussian-sketch privacy up to a constant factor requires an explicit first-principles derivation of the relevant sub-Gaussian tail bounds and sensitivity for the structured fast transform; if the proof invokes rotational invariance or concentration lemmas that hold only for i.i.d. Gaussian entries without re-deriving the failure probabilities or log factors for the fast case, the constant-factor statement does not hold outside narrow parameter regimes.

Authors: We agree that a first-principles derivation is essential and believe the manuscript already supplies one. In the proof of the main privacy theorem (Theorem 3.2), we derive the sub-Gaussian norm and tail bounds directly from the mixing properties of the fast structured transform (randomized Hadamard followed by the mixing matrix) and the coordinate-wise sensitivity of the sketch. The argument uses only the known concentration of the fast transform (via its recursive structure and union bounds over the Walsh basis) together with a direct calculation of the privacy loss random variable; it does not invoke rotational invariance of Gaussians or any Gaussian-specific lemmas. The resulting failure probabilities and logarithmic factors are tracked explicitly and appear in the final constant-factor comparison (see the regime statements in Theorem 3.2 and the discussion following Corollary 3.3). We are therefore confident that the constant-factor claim holds in the stated favorable regimes. If the referee can point to a specific step that appears to rely on an un-rederived Gaussian fact, we will gladly expand that step in the revision. revision: no

Circularity Check

0 steps flagged

No circularity; new mechanism's privacy analysis presented as independent

full rationale

The abstract and context describe introduction of a new fast-transform DP sketching mechanism with claimed proofs of privacy guarantees that match Gaussian sketches up to constant factor in favorable regimes, plus an application to DP OLS. No equations, fitted parameters, or derivations are shown reducing by construction to inputs, self-citations, or ansatzes. Self-citations to Lev et al. (2025/2026) supply background on prior Gaussian work but are not invoked as the sole justification for the new mechanism's bounds; the paper states it proves the new guarantees separately. This matches the default expectation of a self-contained derivation with no load-bearing reduction to prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities beyond the mechanism name itself; full paper would be needed to audit.

invented entities (1)

Fast mixing mechanism no independent evidence
purpose: DP sketching via fast transforms with runtime matching classical methods
New mechanism introduced to achieve the stated privacy and speed properties

pith-pipeline@v0.9.1-grok · 5767 in / 1129 out tokens · 32091 ms · 2026-06-29T08:11:26.247714+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 1 canonical work pages · 1 internal anchor

[1]

A History of Census Privacy Protections

United States Census Bureau. A History of Census Privacy Protections. https://www.census. gov/library/visualizations/2019/comm/history-privacy-protection.html,

2019
[2]

Manuel Gil, Fady Alajaji, and Tamas Linder

URL https://research.facebook.com/blog/2020/6/ protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/. Manuel Gil, Fady Alajaji, and Tamas Linder. R´ enyi divergence measures for commonly used univariate continuous distributions.Information Sciences, 249:124–131,

2020
[3]

Tamas Sarlos

Accessed: 2022-07-28. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science, pages 143–152. IEEE,

2022
[4]

Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing

Jalaj Upadhyay. Randomness-efficient fast Johnson–Lindenstrauss transform with applications in differential privacy and compressed sensing.arXiv preprint arXiv:1410.2470,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

We will first provide a proof of Lemma

C Proof of Theorem 1 Our proof will use the next standard formula for the R´ enyi divergence between two d-dimensional multivariate Gaussian distributions [Gil et al., 2013] Dα N ⃗0d,Σ 1 N ⃗0d,Σ 2 =− 1 2(α−1) log det (Σ1 +α(Σ 2 −Σ 1)) (det (Σ1))1−α (det (Σ2))α (6) which holds for allαsuch thatαΣ −1 1 + (1−α)Σ −1 2 = Σ−1 2 +α Σ−1 1 −Σ −1 2 ≻0. We will firs...

2013
[6]

[2025, Corollary

Thus, using Lev et al. [2025, Corollary. 1], we know that this is upper bounded by 1 2(α−1) α·log 1− 5 4γ −log 1− 5α 4γ ≤ 25α 32γ2 for all α∈ (1, 8γ/25) and provided that γ≥ 25/8. This establishes the bound presented in Lemma

2025
[7]

whenever S⊤ f Sf = In. In this case, we getC 2 Sf X = 1, and the overall divergence becomes 1 2(α−1) α·log 1− 1 λX min +σ 2 −log 1− α λX min +σ 2 and for λX min+σ2 > 1 and 1 < α < λ X min+σ2, which recovers exactly the analysis from Lev et al. [2025, Appendix. B]. We note that in this case, Sf is an orthogonal matrix, sog ⊤Sf ∼N (⃗0n,I n), which is distri...

2025
[8]

Lemma 3.Letu, v∈R d

Throughout, we use the notion of ℓ1 sensitivity, which, given a target functionf(X), is defined by max (X,X ′):X ′≃X f(X)−f(X ′) . Lemma 3.Letu, v∈R d. Then, uv⊤ op =∥u∥ ∥v∥, uv⊤ +vu ⊤ op =∥u∥ ∥v∥+ u⊤v . Proof. If either u = ⃗0d or v = ⃗0d, both identities are immediate. Hence, assume u, v̸ = ⃗0d, and define u := u ∥u∥ , v := v ∥v∥ . We first calculate th...

1991
[9]

First, note that for the τ stated in the theorem and by using a union-bound, it holds P(E c)≤P bλ≤ eλ +P(em≤bm) =P(z 1 ≥τ) +P(z 2 ≥τ)≤ 2δ 3

o wherez 1 ∼Lap(0,1) eλ max n 0, λSf X min −ω(bΛ + 2em)(τ−z2) o wherez 2 ∼Lap(0,1) Let E := n eλ≤bλ T(em≥bm) o . First, note that for the τ stated in the theorem and by using a union-bound, it holds P(E c)≤P bλ≤ eλ +P(em≤bm) =P(z 1 ≥τ) +P(z 2 ≥τ)≤ 2δ 3 . From now on, we prove the privacy guarantees of the mechanism conditioned on the event E, where by Lem...

2014
[10]

Computing Z.Involves applying an SRHT of size ( k2, n) to X∈R n×d

C.4.2 Runtime Complexity We provide the proof by explicitly writing the complexity of the different steps inside FastMix and their runtime complexity. Computing Z.Involves applying an SRHT of size ( k2, n) to X∈R n×d. Following classical arguments (see, for example, Ailon and Liberty [2009]), the runtime complexity of this step is O(ndlog (k2)). However, ...

2009
[11]

Therefore, U ⊤ S⊤ HSH −I n ei = R⊤ i Q⊤ i S⊤ HSH −I n Qiqi ≤ Q⊤ i S⊤ HSH −I n Qi op

Since both U and ei belong to span(Qi), there existR i andq i such that U=Q iRi, e i =Q iqi,∥R i∥op ≤1,∥q i∥2 ≤1. Therefore, U ⊤ S⊤ HSH −I n ei = R⊤ i Q⊤ i S⊤ HSH −I n Qiqi ≤ Q⊤ i S⊤ HSH −I n Qi op . Hence X ⊤ S⊤ HSH −I n ei ≤ ∥X∥ op Q⊤ i S⊤ HSH −I n Qi op . Then, for each fixedi, by Cohen et al. [2016, Theorem. 9] it holds that E Q⊤ i S⊤ HSH −I n Qi 2 op...

2016
[12]

By Markov’s inequality, this further implies that P Q⊤ i S⊤ HSH −I n Qi op > χ ≤δ i

+ log 1 χδi log r+ 1 δi for a universal constantC >0. By Markov’s inequality, this further implies that P Q⊤ i S⊤ HSH −I n Qi op > χ ≤δ i. Consequently, P X ⊤ S⊤ HSH −I n ei > χ∥X∥ op ≤δ i. Then, choosingδ i := ϱ/nand union bounding overi∈[n] yields P max i∈[n] X ⊤ S⊤ HSH −I n ei > χ∥X∥ op ≤ nX i=1 δi =ϱ. This completes the proof. Lemma 10.Let X∈R n×d and...

2016
[13]

Throughout, quantities computed in the t’th call toFastMixare indexed byt for example,emt,eλt,S H,t,S G,t . Our proof uses Lemma 7 and Lemma 1 to calculate the RDP guarantees for each step forming eGt and eXt, and then applies the RDP composition theorem to obtain their overall guarantees separately. In particular, we separate the contribution of the Lapl...

2020
[14]

and seek to obtain an high probability bound on ∥bθT −θ ∗∥2 X ⊤X, which corresponds to the excess empirical risk [Wang, 2018, Lemma. 5]. Recall that in terms of notation, emt denote the private version of bmt andeλt the private version ofbλt, wherebλt :=λ SH,tX min .For anya≥0, we define X a := (X ⊤, aI d)⊤ andM a :=X ⊤X+a 2Id = X ⊤ a X a. Step 1: Definin...

2018
[15]

We note that the eventB tot(χ) implies λSH,tX min ≥(1−χ)λ X min and max i∈[n] X ⊤ S⊤ H,tSH,t −I n ei ≤ χ 2 ∥X∥op for allt= 0, . . . , T−1. This holds since for everyi∈[n], the eventB 2,t(χ) implies X ⊤ S⊤ H,tSH,t −I n ei = sup ∥v∥=1 e⊤ i S⊤ H,tSH,t −I n Xv ≤ χ 2 sup ∥v∥=1 ∥Xv∥= χ 2 ∥X∥op . Conditioning onB tot(χ), we next define for eacht= 0, . . . , T−1 ...

2016
[16]

[2026, Appendix

As in Lev et al. [2026, Appendix. C], for χ below a sufficiently small constant less than 1 4, attainable by increasing k1 and k2 by constant factors, this bound is less than C whenever C≥a 0 ·max{C Y ,∥θ ∗∥} and k1 ≥a 1 ·κ X(η2)·d, k 2 ≥a 1 ·κ X(η2)·d·log 4(n) for constantsa 0, a1. Now, to further simplify this argument, we note that κX η2 = λX max −λ X ...

2026
[17]

we get that the constraints onk 1, k2 can be enforced by setting k1 ≥a 2 · d· λX max −λ X min εp log(1/δ) !2/3 , k 2 ≥a 2 · d· λX max −λ X min εp log(1/δ) ·log 4(n) !2/3 . Overall, conditioned onA tot clipping does not occur provided thatC≥a 0 ·max{C Y ,∥θ ∗∥}and k1χ2 ≥a 2 ·max d, gX ,log 8T¯cG ϱ , √ 2d+ p 4 log (8T/ϱ) 2 , k2χ2 ≥a 2 ·max ( rank(X) log4(n)...

2026
[18]

[2021, Section 4.1]), b∆≥ s n−k k(n−1)

Moreover, by Welch’s lower bound [Welch, 1974, Theorem 1] (see also the discussion in Haikin et al. [2021, Section 4.1]), b∆≥ s n−k k(n−1) . In particular, when n k ≫ 1, the optimal coherence scale permitted by Welch’s bound is Θ k−1/2 , while for n≈k + 1 it might get as small as Θ k−1 . For the SRHT, as shown in Appendix E, w.p. at least 1−ϱthere exists ...

1974
[19]

For performing the Hadamard transform, we have used the optimized implementation from https://github.com/falconn-lib/ffht

The airline dataset was subsampled to 4194304 points, to allow for reasonable solution times. For performing the Hadamard transform, we have used the optimized implementation from https://github.com/falconn-lib/ffht. We report the excess empirical risk. First, we calculate the normalized mean squared error (MSE) for the train set, computed as the squared ...

2026
[20]

For both schemes, we omitted the term −η2bθt from the computation of eGt, as it improves performance by making the iterates converge closer to θ∗ than to θ∗(η2)

For FastIHM, we used k1 = 6 ·max{d,log( 4T/ϱ)}, T = 4 and C = CY . For both schemes, we omitted the term −η2bθt from the computation of eGt, as it improves performance by making the iterates converge closer to θ∗ than to θ∗(η2). This does not affect the privacy analysis, so we omit this term in the empirical evaluation. Whenever we used the Gaussian mecha...

2018
[21]

and σ2 = 0 .1. The second dataset aims to test the performance of the algorithm in a poorly conditioned setting (termedsynthetic correlated), where usually the computational advantage of the IHS becomes prominent (see, for example, experiments done in Pilanci and Wainwright [2016], Lacotte et al. [2020]). To that end, we generate a design matrix built fro...

2016
[22]

In particular, we use the airline dataset [ope, 2020], subsampled to n = 222 samples, leads to n/d = 222 /4 =

I Additional Experiments We now provide additional experiments for a setting where the ratio n/d is extremely large, rep- resenting a setting where sketching is most beneficial in terms of runtime. In particular, we use the airline dataset [ope, 2020], subsampled to n = 222 samples, leads to n/d = 222 /4 =

2020
[23]

We sweep across values of k2 max{d,log(4T/ϱ)} ∈ [4, 1200] and fix T = 4 , k1 = 6 ·max{d,log( 4T/ϱ)} and k = 6 ·max{d,log( 4T/ϱ)}. As seen in Figure 4, in this regime, FastIHM allows for attaining close-to- optimal performance at a significantly lower runtime, demonstrating the most favorable regime for sketching. J Dataset Statistics The next table summar...

2018

[1] [1]

A History of Census Privacy Protections

United States Census Bureau. A History of Census Privacy Protections. https://www.census. gov/library/visualizations/2019/comm/history-privacy-protection.html,

2019

[2] [2]

Manuel Gil, Fady Alajaji, and Tamas Linder

URL https://research.facebook.com/blog/2020/6/ protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/. Manuel Gil, Fady Alajaji, and Tamas Linder. R´ enyi divergence measures for commonly used univariate continuous distributions.Information Sciences, 249:124–131,

2020

[3] [3]

Tamas Sarlos

Accessed: 2022-07-28. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science, pages 143–152. IEEE,

2022

[4] [4]

Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing

Jalaj Upadhyay. Randomness-efficient fast Johnson–Lindenstrauss transform with applications in differential privacy and compressed sensing.arXiv preprint arXiv:1410.2470,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

We will first provide a proof of Lemma

C Proof of Theorem 1 Our proof will use the next standard formula for the R´ enyi divergence between two d-dimensional multivariate Gaussian distributions [Gil et al., 2013] Dα N ⃗0d,Σ 1 N ⃗0d,Σ 2 =− 1 2(α−1) log det (Σ1 +α(Σ 2 −Σ 1)) (det (Σ1))1−α (det (Σ2))α (6) which holds for allαsuch thatαΣ −1 1 + (1−α)Σ −1 2 = Σ−1 2 +α Σ−1 1 −Σ −1 2 ≻0. We will firs...

2013

[6] [6]

[2025, Corollary

Thus, using Lev et al. [2025, Corollary. 1], we know that this is upper bounded by 1 2(α−1) α·log 1− 5 4γ −log 1− 5α 4γ ≤ 25α 32γ2 for all α∈ (1, 8γ/25) and provided that γ≥ 25/8. This establishes the bound presented in Lemma

2025

[7] [7]

whenever S⊤ f Sf = In. In this case, we getC 2 Sf X = 1, and the overall divergence becomes 1 2(α−1) α·log 1− 1 λX min +σ 2 −log 1− α λX min +σ 2 and for λX min+σ2 > 1 and 1 < α < λ X min+σ2, which recovers exactly the analysis from Lev et al. [2025, Appendix. B]. We note that in this case, Sf is an orthogonal matrix, sog ⊤Sf ∼N (⃗0n,I n), which is distri...

2025

[8] [8]

Lemma 3.Letu, v∈R d

Throughout, we use the notion of ℓ1 sensitivity, which, given a target functionf(X), is defined by max (X,X ′):X ′≃X f(X)−f(X ′) . Lemma 3.Letu, v∈R d. Then, uv⊤ op =∥u∥ ∥v∥, uv⊤ +vu ⊤ op =∥u∥ ∥v∥+ u⊤v . Proof. If either u = ⃗0d or v = ⃗0d, both identities are immediate. Hence, assume u, v̸ = ⃗0d, and define u := u ∥u∥ , v := v ∥v∥ . We first calculate th...

1991

[9] [9]

First, note that for the τ stated in the theorem and by using a union-bound, it holds P(E c)≤P bλ≤ eλ +P(em≤bm) =P(z 1 ≥τ) +P(z 2 ≥τ)≤ 2δ 3

o wherez 1 ∼Lap(0,1) eλ max n 0, λSf X min −ω(bΛ + 2em)(τ−z2) o wherez 2 ∼Lap(0,1) Let E := n eλ≤bλ T(em≥bm) o . First, note that for the τ stated in the theorem and by using a union-bound, it holds P(E c)≤P bλ≤ eλ +P(em≤bm) =P(z 1 ≥τ) +P(z 2 ≥τ)≤ 2δ 3 . From now on, we prove the privacy guarantees of the mechanism conditioned on the event E, where by Lem...

2014

[10] [10]

Computing Z.Involves applying an SRHT of size ( k2, n) to X∈R n×d

C.4.2 Runtime Complexity We provide the proof by explicitly writing the complexity of the different steps inside FastMix and their runtime complexity. Computing Z.Involves applying an SRHT of size ( k2, n) to X∈R n×d. Following classical arguments (see, for example, Ailon and Liberty [2009]), the runtime complexity of this step is O(ndlog (k2)). However, ...

2009

[11] [11]

Therefore, U ⊤ S⊤ HSH −I n ei = R⊤ i Q⊤ i S⊤ HSH −I n Qiqi ≤ Q⊤ i S⊤ HSH −I n Qi op

Since both U and ei belong to span(Qi), there existR i andq i such that U=Q iRi, e i =Q iqi,∥R i∥op ≤1,∥q i∥2 ≤1. Therefore, U ⊤ S⊤ HSH −I n ei = R⊤ i Q⊤ i S⊤ HSH −I n Qiqi ≤ Q⊤ i S⊤ HSH −I n Qi op . Hence X ⊤ S⊤ HSH −I n ei ≤ ∥X∥ op Q⊤ i S⊤ HSH −I n Qi op . Then, for each fixedi, by Cohen et al. [2016, Theorem. 9] it holds that E Q⊤ i S⊤ HSH −I n Qi 2 op...

2016

[12] [12]

By Markov’s inequality, this further implies that P Q⊤ i S⊤ HSH −I n Qi op > χ ≤δ i

+ log 1 χδi log r+ 1 δi for a universal constantC >0. By Markov’s inequality, this further implies that P Q⊤ i S⊤ HSH −I n Qi op > χ ≤δ i. Consequently, P X ⊤ S⊤ HSH −I n ei > χ∥X∥ op ≤δ i. Then, choosingδ i := ϱ/nand union bounding overi∈[n] yields P max i∈[n] X ⊤ S⊤ HSH −I n ei > χ∥X∥ op ≤ nX i=1 δi =ϱ. This completes the proof. Lemma 10.Let X∈R n×d and...

2016

[13] [13]

Throughout, quantities computed in the t’th call toFastMixare indexed byt for example,emt,eλt,S H,t,S G,t . Our proof uses Lemma 7 and Lemma 1 to calculate the RDP guarantees for each step forming eGt and eXt, and then applies the RDP composition theorem to obtain their overall guarantees separately. In particular, we separate the contribution of the Lapl...

2020

[14] [14]

and seek to obtain an high probability bound on ∥bθT −θ ∗∥2 X ⊤X, which corresponds to the excess empirical risk [Wang, 2018, Lemma. 5]. Recall that in terms of notation, emt denote the private version of bmt andeλt the private version ofbλt, wherebλt :=λ SH,tX min .For anya≥0, we define X a := (X ⊤, aI d)⊤ andM a :=X ⊤X+a 2Id = X ⊤ a X a. Step 1: Definin...

2018

[15] [15]

We note that the eventB tot(χ) implies λSH,tX min ≥(1−χ)λ X min and max i∈[n] X ⊤ S⊤ H,tSH,t −I n ei ≤ χ 2 ∥X∥op for allt= 0, . . . , T−1. This holds since for everyi∈[n], the eventB 2,t(χ) implies X ⊤ S⊤ H,tSH,t −I n ei = sup ∥v∥=1 e⊤ i S⊤ H,tSH,t −I n Xv ≤ χ 2 sup ∥v∥=1 ∥Xv∥= χ 2 ∥X∥op . Conditioning onB tot(χ), we next define for eacht= 0, . . . , T−1 ...

2016

[16] [16]

[2026, Appendix

As in Lev et al. [2026, Appendix. C], for χ below a sufficiently small constant less than 1 4, attainable by increasing k1 and k2 by constant factors, this bound is less than C whenever C≥a 0 ·max{C Y ,∥θ ∗∥} and k1 ≥a 1 ·κ X(η2)·d, k 2 ≥a 1 ·κ X(η2)·d·log 4(n) for constantsa 0, a1. Now, to further simplify this argument, we note that κX η2 = λX max −λ X ...

2026

[17] [17]

we get that the constraints onk 1, k2 can be enforced by setting k1 ≥a 2 · d· λX max −λ X min εp log(1/δ) !2/3 , k 2 ≥a 2 · d· λX max −λ X min εp log(1/δ) ·log 4(n) !2/3 . Overall, conditioned onA tot clipping does not occur provided thatC≥a 0 ·max{C Y ,∥θ ∗∥}and k1χ2 ≥a 2 ·max d, gX ,log 8T¯cG ϱ , √ 2d+ p 4 log (8T/ϱ) 2 , k2χ2 ≥a 2 ·max ( rank(X) log4(n)...

2026

[18] [18]

[2021, Section 4.1]), b∆≥ s n−k k(n−1)

Moreover, by Welch’s lower bound [Welch, 1974, Theorem 1] (see also the discussion in Haikin et al. [2021, Section 4.1]), b∆≥ s n−k k(n−1) . In particular, when n k ≫ 1, the optimal coherence scale permitted by Welch’s bound is Θ k−1/2 , while for n≈k + 1 it might get as small as Θ k−1 . For the SRHT, as shown in Appendix E, w.p. at least 1−ϱthere exists ...

1974

[19] [19]

For performing the Hadamard transform, we have used the optimized implementation from https://github.com/falconn-lib/ffht

The airline dataset was subsampled to 4194304 points, to allow for reasonable solution times. For performing the Hadamard transform, we have used the optimized implementation from https://github.com/falconn-lib/ffht. We report the excess empirical risk. First, we calculate the normalized mean squared error (MSE) for the train set, computed as the squared ...

2026

[20] [20]

For both schemes, we omitted the term −η2bθt from the computation of eGt, as it improves performance by making the iterates converge closer to θ∗ than to θ∗(η2)

For FastIHM, we used k1 = 6 ·max{d,log( 4T/ϱ)}, T = 4 and C = CY . For both schemes, we omitted the term −η2bθt from the computation of eGt, as it improves performance by making the iterates converge closer to θ∗ than to θ∗(η2). This does not affect the privacy analysis, so we omit this term in the empirical evaluation. Whenever we used the Gaussian mecha...

2018

[21] [21]

and σ2 = 0 .1. The second dataset aims to test the performance of the algorithm in a poorly conditioned setting (termedsynthetic correlated), where usually the computational advantage of the IHS becomes prominent (see, for example, experiments done in Pilanci and Wainwright [2016], Lacotte et al. [2020]). To that end, we generate a design matrix built fro...

2016

[22] [22]

In particular, we use the airline dataset [ope, 2020], subsampled to n = 222 samples, leads to n/d = 222 /4 =

I Additional Experiments We now provide additional experiments for a setting where the ratio n/d is extremely large, rep- resenting a setting where sketching is most beneficial in terms of runtime. In particular, we use the airline dataset [ope, 2020], subsampled to n = 222 samples, leads to n/d = 222 /4 =

2020

[23] [23]

We sweep across values of k2 max{d,log(4T/ϱ)} ∈ [4, 1200] and fix T = 4 , k1 = 6 ·max{d,log( 4T/ϱ)} and k = 6 ·max{d,log( 4T/ϱ)}. As seen in Figure 4, in this regime, FastIHM allows for attaining close-to- optimal performance at a significantly lower runtime, demonstrating the most favorable regime for sketching. J Dataset Statistics The next table summar...

2018