pith. sign in

arxiv: 2605.23131 · v1 · pith:B6MNHYAUnew · submitted 2026-05-22 · 💻 cs.LG

When Determinants Are Not Enough: Private Rare Switching

Pith reviewed 2026-05-25 05:02 UTC · model grok-4.3

classification 💻 cs.LG
keywords private linear banditsrare switchinggeneralized Rayleigh quotientdifferential privacyregret analysisdesign matrixpolicy updatesconfidence ellipsoid
0
0 comments X

The pith

A generalized Rayleigh quotient rule restores logarithmic rare switching and confidence-width comparisons in private linear bandits after privacy noise breaks determinant monotonicity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard rare-switching methods in linear bandits and RL trigger policy updates when the determinant of the accumulating design matrix grows by a fixed factor. This works because the determinant tracks volume growth monotonically, which suffices for regret bounds. Adding Gaussian noise for differential privacy can make the determinant non-monotonic, so the usual volume argument no longer guarantees the required control over the worst direction in the confidence ellipsoid. Replacing the determinant trigger with a generalized Rayleigh quotient criterion restores both the logarithmic number of switches and the necessary width comparison, up to a constant factor. A reader would care because the fix lets private algorithms keep the sample-efficiency advantages of rare switching without new regret terms.

Core claim

The generalized Rayleigh quotient supplies a rare-switching rule that, even after Gaussian privacy noise perturbs the design matrix, keeps the number of policy updates logarithmic in the horizon and ensures that the maximum directional width of the confidence ellipsoid satisfies the comparison needed for regret analysis up to a constant multiplicative factor.

What carries the argument

The generalized Rayleigh quotient rare-switching rule, which monitors the largest directional stretch of the perturbed matrix rather than its volume.

If this is right

  • Only a logarithmic number of policy updates occur over the entire horizon even when privacy noise is present.
  • Regret bounds for private linear bandits and RL follow from the same analysis as the non-private case, multiplied by a fixed constant.
  • The same switching schedule works for both linear bandits and the linear-function-approximation RL setting.
  • Privacy noise no longer forces a switch to linear-in-horizon update schedules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same directional-control idea could apply to other adaptive algorithms whose analysis relies on monotonic matrix growth but must tolerate additive noise.
  • Empirical checks on synthetic private data streams could measure how large the hidden constant actually becomes in moderate dimensions.
  • Matrix perturbation bounds already known for eigenvalues may directly quantify the constant factor lost to privacy noise.

Load-bearing premise

The generalized Rayleigh quotient still controls the worst single direction tightly enough for the regret proof to close after the privacy noise is added.

What would settle it

A calculation or simulation in which the maximum directional confidence width under the Rayleigh quotient rule exceeds the non-private width by more than a small constant factor for some sequence of private design matrices would falsify the claim.

read the original abstract

In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that Gaussian privacy noise breaks the monotonicity of the design matrix determinant, invalidating standard rare-switching rules in linear bandits and RL; it proposes a generalized Rayleigh quotient update rule that restores logarithmic policy switches and a confidence-width comparison (up to constant factor) sufficient for the regret analysis to close, and presents a manually cleaned-up proof of this claim.

Significance. If the central claim holds, the result supplies a concrete mechanism for adapting volume-control techniques to the private setting while preserving the logarithmic switch count and regret bounds needed for private linear bandits and RL. The explicit presentation of the cleaned-up proof is a strength, as it allows direct inspection of the derivation from first principles of worst-direction control.

major comments (1)
  1. [Proof section (generalized Rayleigh quotient rule)] The central claim (that the generalized Rayleigh quotient restores a confidence-width comparison up to constant factor after noise addition) is load-bearing for the regret bound, yet the manuscript does not state the precise comparison (e.g., how the maximum eigenvalue or worst-direction width is bounded relative to the non-private case) nor the dependence of the constant on noise variance and dimension. This matches the weakest assumption identified in the stress test and prevents verification that the logarithmic switch count survives.
minor comments (2)
  1. [Abstract] The title and abstract refer to 'Codex' and 'personal reflection'; for a formal journal submission these could be rephrased to focus on the technical contribution.
  2. [Proof section] Notation for the generalized Rayleigh quotient is introduced without an explicit equation label; adding an equation number would aid cross-reference in the proof.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying the need for greater explicitness in the proof of the generalized Rayleigh quotient rule. We address the major comment below and will incorporate the requested clarifications in a revised version.

read point-by-point responses
  1. Referee: [Proof section (generalized Rayleigh quotient rule)] The central claim (that the generalized Rayleigh quotient restores a confidence-width comparison up to constant factor after noise addition) is load-bearing for the regret bound, yet the manuscript does not state the precise comparison (e.g., how the maximum eigenvalue or worst-direction width is bounded relative to the non-private case) nor the dependence of the constant on noise variance and dimension. This matches the weakest assumption identified in the stress test and prevents verification that the logarithmic switch count survives.

    Authors: We agree that the manuscript would be strengthened by an explicit statement of the comparison and the dependence of the constant. The proof already establishes that the generalized Rayleigh quotient yields a confidence-width comparison up to a constant factor (via the worst-direction volume control property), but we will add a dedicated lemma in the revision that states the bound precisely: the worst-direction width after noise addition is at most C times the non-private width, where C depends on the noise variance σ² and dimension d through the term 1 + O(σ² d / λ_min(V_t)). This follows from the eigenvalue perturbation induced by the Gaussian noise and the definition of the quotient, directly implying that the logarithmic switch count is preserved up to constants independent of T. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation of generalized Rayleigh quotient rule is self-contained

full rationale

The paper derives the rare-switching rule from first principles of volume control versus worst-direction control after Gaussian noise addition. No steps reduce by construction to fitted inputs, self-citations, or renamed known results; the generalized Rayleigh quotient is introduced as a new adaptation to restore the needed comparison up to a constant factor. The central claim does not rely on load-bearing self-citation or ansatz smuggling. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the new rule appears to rest on standard linear-algebraic properties of the Rayleigh quotient.

pith-pipeline@v0.9.0 · 5644 in / 985 out tokens · 14911 ms · 2026-05-25T05:02:24.687754+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

127 extracted references · 127 canonical work pages · 7 internal anchors

  1. [1]

    International Conference on Learning Representations , volume=

    On differentially private federated linear contextual bandits , author=. International Conference on Learning Representations , volume=

  2. [2]

    Towards Differentially Private Reinforcement Learning with General Function Approximation

    Towards Differentially Private Reinforcement Learning with General Function Approximation , author=. arXiv preprint arXiv:2605.07049 , year=

  3. [3]

    International Conference on Machine Learning , pages=

    Shuffle Private Linear Contextual Bandits , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  4. [4]

    Advances in neural information processing systems , volume=

    Improved algorithms for linear stochastic bandits , author=. Advances in neural information processing systems , volume=

  5. [5]

    Between Pure and Approximate Differential Privacy

    Between pure and approximate differential privacy , author=. arXiv preprint arXiv:1501.06095 , year=

  6. [6]

    arXiv preprint arXiv:2305.18465 , year=

    Federated Learning of Gboard Language Models with Differential Privacy , author=. arXiv preprint arXiv:2305.18465 , year=

  7. [7]

    Zhou, Xingyu , title =

  8. [8]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  9. [9]

    Optimizing error of high-dimensional statistical queries under differential privacy

    Optimizing error of high-dimensional statistical queries under differential privacy , author=. arXiv preprint arXiv:1808.03537 , year=

  10. [10]

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    Convex optimization for linear query processing under approximate differential privacy , author=. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  11. [11]

    Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=

    The power of factorization mechanisms in local and central differential privacy , author=. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages=

  12. [12]

    The VLDB journal , volume=

    The matrix mechanism: optimizing linear counting queries under differential privacy , author=. The VLDB journal , volume=. 2015 , publisher=

  13. [13]

    Advances in Neural Information Processing Systems , volume=

    Improved differential privacy for sgd via optimal private linear operators on adaptive streams , author=. Advances in Neural Information Processing Systems , volume=

  14. [14]

    Theory and Practice of Differential Privacy (TPDP 2015), London, UK , volume=

    Efficient use of differentially private binary trees , author=. Theory and Practice of Differential Privacy (TPDP 2015), London, UK , volume=

  15. [15]

    Proceedings of the forty-second ACM symposium on Theory of computing , pages=

    Differential privacy under continual observation , author=. Proceedings of the forty-second ACM symposium on Theory of computing , pages=

  16. [16]

    Introducing

    Tao Qin and Tie. Introducing. CoRR , volume =. 2013 , url =

  17. [17]

    arXiv preprint arXiv:2207.03106 , year=

    A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits , author=. arXiv preprint arXiv:2207.03106 , year=

  18. [18]

    Advances in Neural Information Processing Systems , volume=

    Federated linear contextual bandits , author=. Advances in Neural Information Processing Systems , volume=

  19. [19]

    Proceedings of the 5th conference on Innovations in theoretical computer science , pages=

    Mechanism design in large games: Incentives and privacy , author=. Proceedings of the 5th conference on Innovations in theoretical computer science , pages=

  20. [20]

    International Conference on Machine Learning , pages=

    Kernel methods for cooperative multi-agent contextual bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  21. [21]

    arXiv preprint arXiv:2206.05772 , year=

    Distributed Differential Privacy in Multi-Armed Bandits , author=. arXiv preprint arXiv:2206.05772 , year=

  22. [22]

    arXiv preprint arXiv:2210.00597 , year=

    Composition of Differential Privacy & Privacy Amplification by Subsampling , author=. arXiv preprint arXiv:2210.00597 , year=

  23. [23]

    Proceedings of the 39th International Conference on Machine Learning , pages =

    Shuffle Private Linear Contextual Bandits , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , volume =

  24. [24]

    arXiv preprint arXiv:1806.06035 , year=

    Customized local differential privacy for multi-agent distributed optimization , author=. arXiv preprint arXiv:1806.06035 , year=

  25. [25]

    arXiv preprint arXiv:2208.04591 , year=

    Stronger Privacy Amplification by Shuffling for R 'enyi and Approximate Differential Privacy , author=. arXiv preprint arXiv:2208.04591 , year=

  26. [26]

    International Conference on Artificial Intelligence and Statistics , pages=

    Federated f-differential privacy , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

  27. [27]

    arXiv preprint arXiv:2206.07902 , year=

    On Privacy and Personalization in Cross-Silo Federated Learning , author=. arXiv preprint arXiv:2206.07902 , year=

  28. [28]

    arXiv preprint arXiv:2203.06735 , year=

    Private Non-Convex Federated Learning Without a Trusted Server , author=. arXiv preprint arXiv:2203.06735 , year=

  29. [29]

    International Conference on Machine Learning , pages=

    Provably efficient exploration in policy optimization , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  30. [30]

    2022 IEEE Wireless Communications and Networking Conference (WCNC) , pages=

    FedQOGD: Federated Quantized Online Gradient Descent with Distributed Time-Series Data , author=. 2022 IEEE Wireless Communications and Networking Conference (WCNC) , pages=. 2022 , organization=

  31. [31]

    Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

    Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

  32. [32]

    A General Approach to Adding Differential Privacy to Iterative Training Procedures

    A general approach to adding differential privacy to iterative training procedures , author=. arXiv preprint arXiv:1812.06210 , year=

  33. [33]

    Learning Differentially Private Recurrent Language Models

    Learning differentially private recurrent language models , author=. arXiv preprint arXiv:1710.06963 , year=

  34. [34]

    ICLR , year=

    Distributed bandit learning: How much communication is needed to achieve (near) optimal regret , author=. ICLR , year=

  35. [35]

    arXiv preprint arXiv:1911.09564 , year=

    Parameter-free locally differentially private stochastic subgradient descent , author=. arXiv preprint arXiv:1911.09564 , year=

  36. [36]

    Online convex optimization over Erdos-R

    Lei, Jinlong and Yi, Peng and Hong, Yiguang and Chen, Jie and Shi, Guodong , journal=. Online convex optimization over Erdos-R

  37. [37]

    International Conference on Machine Learning , pages=

    Projection-free Distributed Online Convex Optimization with Communication Complexity , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  38. [38]

    2013 IEEE 54th Annual Symposium on Foundations of Computer Science , pages=

    Local privacy and statistical minimax rates , author=. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science , pages=. 2013 , organization=

  39. [39]

    2014 IEEE 55th annual symposium on foundations of computer science , pages=

    Private empirical risk minimization: Efficient algorithms and tight error bounds , author=. 2014 IEEE 55th annual symposium on foundations of computer science , pages=. 2014 , organization=

  40. [40]

    Advances in Neural Information Processing Systems , volume=

    Federated Bayesian optimization via Thompson sampling , author=. Advances in Neural Information Processing Systems , volume=

  41. [41]

    Advances in Neural Information Processing Systems , volume=

    Differentially private federated Bayesian optimization with distributed exploration , author=. Advances in Neural Information Processing Systems , volume=

  42. [42]

    A Modern Introduction to Online Learning

    A modern introduction to online learning , author=. arXiv preprint arXiv:1912.13213 , year=

  43. [43]

    Advances in Neural Information Processing Systems , volume=

    (Nearly) optimal algorithms for private online learning in full-information and bandit settings , author=. Advances in Neural Information Processing Systems , volume=

  44. [44]

    Conference on Learning Theory , pages=

    Differentially private online learning , author=. Conference on Learning Theory , pages=. 2012 , organization=

  45. [45]

    ACM Transactions on Information and System Security (TISSEC) , volume=

    Private and continual release of statistics , author=. ACM Transactions on Information and System Security (TISSEC) , volume=. 2011 , publisher=

  46. [46]

    arXiv preprint arXiv:2202.01292 , year=

    Improved Regret for Differentially Private Exploration in Linear MDP , author=. arXiv preprint arXiv:2202.01292 , year=

  47. [47]

    , author=

    Stochastic Convex Optimization. , author=. COLT , volume=

  48. [48]

    SIAM Journal on Optimization , volume=

    Differentially private accelerated optimization algorithms , author=. SIAM Journal on Optimization , volume=. 2022 , publisher=

  49. [49]

    Advances in Neural Information Processing Systems , volume=

    Scalable generalized linear bandits: Online computation and hashing , author=. Advances in Neural Information Processing Systems , volume=

  50. [50]

    arXiv preprint arXiv:2202.01087 , year=

    Communication Efficient Federated Learning for Generalized Linear Bandits , author=. arXiv preprint arXiv:2202.01087 , year=

  51. [51]

    Advances in Neural Information Processing Systems , volume=

    Differentially-private federated linear bandits , author=. Advances in Neural Information Processing Systems , volume=

  52. [52]

    Journal of Machine Learning Research , volume=

    A contextual bandit bake-off , author=. Journal of Machine Learning Research , volume=

  53. [53]

    International Conference on Machine Learning , pages=

    Practical and private (deep) learning without sampling or shuffling , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  54. [54]

    Advances in Neural Information Processing Systems , volume=

    Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge , author=. Advances in Neural Information Processing Systems , volume=

  55. [55]

    Advances in Neural Information Processing Systems , volume=

    Competitive on-line linear regression , author=. Advances in Neural Information Processing Systems , volume=

  56. [56]

    Machine Learning , volume=

    A generalized online mirror descent with applications to classification and regression , author=. Machine Learning , volume=. 2015 , publisher=

  57. [57]

    Advances in Neural Information Processing Systems , volume=

    Efficient learning of generalized linear and single index models with isotonic regression , author=. Advances in Neural Information Processing Systems , volume=

  58. [58]

    ICML , pages=

    Associative reinforcement learning using linear probabilistic concepts , author=. ICML , pages=. 1999 , organization=

  59. [59]

    International Conference on Machine Learning , pages=

    Beyond ucb: Optimal and efficient contextual bandits with regression oracles , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  60. [60]

    Mathematics of Operations Research , year=

    Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , year=

  61. [61]

    arXiv preprint arXiv:2002.01919 , year=

    Pure differentially private summation from anonymous messages , author=. arXiv preprint arXiv:2002.01919 , year=

  62. [62]

    2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=

    Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=

  63. [63]

    Annual International Cryptology Conference , pages=

    The privacy blanket of the shuffle model , author=. Annual International Cryptology Conference , pages=. 2019 , organization=

  64. [64]

    Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=

    Amplification by shuffling: From local to central differential privacy via anonymity , author=. Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages=. 2019 , organization=

  65. [65]

    Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=

    Private aggregation from fewer anonymous messages , author=. Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=. 2020 , organization=

  66. [66]

    Annual International Cryptology Conference , pages=

    Distributed private data analysis: Simultaneously solving how and what , author=. Annual International Cryptology Conference , pages=. 2008 , organization=

  67. [67]

    European Symposium on Algorithms , pages=

    Optimal lower bound for differentially private multi-party aggregation , author=. European Symposium on Algorithms , pages=. 2012 , organization=

  68. [68]

    Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=

    Distributed differential privacy via shuffling , author=. Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages=. 2019 , organization=

  69. [69]

    2004 , institution=

    Tor: The second-generation onion router , author=. 2004 , institution=

  70. [70]

    arXiv preprint arXiv:2110.10133 , year=

    Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes , author=. arXiv preprint arXiv:2110.10133 , year=

  71. [71]

    arXiv preprint arXiv:2112.01585 , year=

    Differentially Private Exploration in Reinforcement Learning with Linear Representation , author=. arXiv preprint arXiv:2112.01585 , year=

  72. [72]

    Zhou, Xingyu , title =. Proc. ACM Meas. Anal. Comput. Syst. , month =. 2022 , issue_date =

  73. [73]

    2021 IEEE International Symposium on Information Theory (ISIT) , pages=

    Adaptive control of differentially private linear quadratic systems , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=

  74. [74]

    arXiv preprint arXiv:2112.10599 , year=

    Differentially Private Regret Minimization in Episodic Markov Decision Processes , author=. arXiv preprint arXiv:2112.10599 , year=

  75. [75]

    Advances in Neural Information Processing Systems , volume=

    Local Differential Privacy for Regret Minimization in Reinforcement Learning , author=. Advances in Neural Information Processing Systems , volume=

  76. [76]

    International Conference on Machine Learning , pages=

    Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  77. [77]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Local Differential Privacy for Bayesian Optimization , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=

  78. [78]

    ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    Cascading Bandit Under Differential Privacy , author=. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2022 , organization=

  79. [79]

    International Conference on Machine Learning , pages=

    (Locally) Differentially Private Combinatorial Semi-Bandits , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  80. [80]

    International Conference on Artificial Intelligence and Statistics , pages=

    Optimal rates of (locally) differentially private heavy-tailed multi-armed bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

Showing first 80 references.