Differentially Private Datastore Generation for Retrieval-Augmented Inference

Abdelrahman Abouelenein; Marwan Torki

arxiv: 2606.01413 · v1 · pith:EG5EJRSBnew · submitted 2026-05-31 · 💻 cs.CR · cs.IR· cs.LG

Differentially Private Datastore Generation for Retrieval-Augmented Inference

Abdelrahman Abouelenein , Marwan Torki This is my paper

Pith reviewed 2026-06-28 16:37 UTC · model grok-4.3

classification 💻 cs.CR cs.IRcs.LG

keywords differential privacylocality-sensitive hashingretrieval-augmented inferencedatastore generationmembership inference attacksprivacy-preserving retrievalon-device AI systems

0 comments

The pith

A locality-sensitive hashing framework with added differential privacy noise releases datastores for retrieval-augmented inference while limiting average accuracy loss to 2.6 percent at epsilon equals 5.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method to create and release differentially private datastores usable in retrieval-augmented inference pipelines on on-device AI systems. It partitions high-dimensional data into buckets via locality-sensitive hashing, then adds calibrated noise to the vote counts within each bucket to produce class probability distributions that satisfy differential privacy. A sympathetic reader would care because this formal privacy guarantee prevents individual data points from being distinguished by adversaries while the reported experiments show the utility cost remains small across multiple datasets. The work also demonstrates reduced success for membership inference attacks on the released datastore.

Core claim

The authors introduce a hashing-based probability generation framework that employs locality-sensitive hashing to efficiently partition high-dimensional data into buckets, adds calibrated differential privacy noise to the accumulated vote for each bucket, and generates a probability distribution across classes from the noised counts. This framework supports the creation and release of differentially private datastores and applies to any pipeline that requires secure key-value datastore creation and release.

What carries the argument

Locality-sensitive hashing partitions high-dimensional data into buckets whose vote counts receive calibrated differential privacy noise to produce class probability distributions.

If this is right

At epsilon equals 5 the released DP datastore achieves strong privacy protection with only an average 2.6 percent drop in accuracy across seven datasets with 2 to 14 classes.
The DP datastore reduces membership inference attack accuracy to 53.60 percent.
The framework applies to any pipeline requiring secure key-value datastore creation and release.
The approach works on datasets with varying sample sizes and class counts ranging from 2 to 14.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If bucket vote counts remain useful after noise addition without per-dataset tuning, the same structure could support privacy-preserving retrieval in additional high-dimensional domains such as image or sensor data.
The reported attack resilience at a single epsilon value suggests the method could be combined with other privacy tools to reach stricter protection levels.
Verification on datasets with class counts outside the tested range of 2 to 14 would clarify how broadly the accuracy preservation holds.

Load-bearing premise

Locality-sensitive hashing produces buckets whose vote counts can be noised with standard differential privacy mechanisms while preserving downstream retrieval utility across arbitrary high-dimensional datasets without additional dataset-specific tuning.

What would settle it

Running the method on an unseen high-dimensional dataset and observing either an accuracy drop substantially larger than 2.6 percent at epsilon equals 5 or a membership inference attack accuracy well above 53.60 percent would show the general claim does not hold.

Figures

Figures reproduced from arXiv: 2606.01413 by Abdelrahman Abouelenein, Marwan Torki.

read the original abstract

It is crucial for modern on-device AI systems that rely on retrieval-augmented inference to release and share datastores without compromising individual privacy. This can be achieved using Differential Privacy (DP), which provides a formal guarantee that ensures individual contributions remain indistinguishable, even under adversarial analysis. In this paper, we introduce a hashing-based probability generation framework designed to enable the creation and release of differentially private datastores. Our approach employs locality-sensitive hashing (LSH) to efficiently partition high-dimensional data into buckets. We then add calibrated DP noise to the accumulated vote for each bucket, generating a probability distribution across classes. Our method is broadly applicable to any pipeline requiring secure key,value datastore creation and release. We conducted experiments on seven datasets with varying sample sizes and class counts, ranging from 2 to 14. At epsilon=5, our released DP datastore achieves strong privacy protection with only an average 2.6% drop in accuracy. Finally, we benchmark DP datastore resilience to membership inference attacks, reducing attack accuracy to 53.60%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a basic LSH bucketing plus per-bucket vote noising pipeline for releasing DP datastores, with modest reported accuracy retention on classification tasks but thin supporting details.

read the letter

The core contribution is a pipeline that uses locality-sensitive hashing to group high-dimensional points into buckets, then adds calibrated noise to the per-bucket class vote counts before releasing a probability distribution. The authors position this as a general method for any secure key-value datastore release in on-device retrieval settings.

It does run the approach across seven datasets that vary in size and number of classes (2 to 14), and it reports an average 2.6% accuracy drop at epsilon=5 along with a reduction in membership-inference attack success to 53.6%. Those numbers give a practical data point for people who need to share retrieval data under privacy constraints.

The main weaknesses are the missing pieces that make the results hard to assess. The abstract supplies no description of the noise mechanism, how sensitivity was bounded, what LSH parameters were chosen, or whether they required per-dataset adjustment. The stress-test concern about LSH collision behavior in high dimensions therefore stands: without evidence that the method works without heavy tuning, the 2.6% figure may be tied to favorable datasets rather than general retrieval utility. The evaluation also focuses on classification accuracy rather than standard retrieval metrics such as recall@K for arbitrary values, which narrows the claim.

This paper is mainly for practitioners who need a lightweight way to release private datastores for on-device RAG-style systems. A reader already working on DP data release could extract the experimental setup and try to reproduce it, but the lack of calibration details and statistical reporting limits how far the results can be trusted without the full manuscript.

I would send it for peer review once the authors add the missing implementation and sensitivity analysis; the idea is straightforward enough that a careful referee could check whether the utility claims survive closer scrutiny.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a locality-sensitive hashing (LSH) framework for creating differentially private datastores: high-dimensional data is partitioned into buckets via LSH, calibrated noise is added to per-bucket class vote counts, and the resulting probability distribution over classes is released for use in retrieval-augmented inference. Experiments on seven datasets (2–14 classes) are reported to yield an average 2.6% accuracy drop at ε=5 while reducing membership-inference attack accuracy to 53.60%. The method is presented as broadly applicable to any secure key-value datastore pipeline.

Significance. If reproducible and generalizable without per-dataset LSH tuning, the approach would offer a lightweight, standard-DP mechanism for private datastore release in on-device retrieval-augmented systems. The use of LSH for efficient bucketing followed by vote noising is a reasonable design choice that could be adopted if the utility bound holds across arbitrary high-dimensional distributions.

major comments (2)

[Abstract] Abstract: the headline claims (2.6% average accuracy drop at ε=5; MIA accuracy 53.60%) are stated without any description of LSH parameters (number of hashes, tables, or bucket width), noise mechanism (Laplace/Gaussian scale, sensitivity calculation), calibration procedure, or statistical tests/error bars. This directly prevents verification of the reported figures and is load-bearing for the central empirical claim.
[Abstract] Abstract: the assertion that the method is 'broadly applicable to any pipeline requiring secure key,value datastore creation' rests on the unexamined assumption that untuned LSH buckets yield vote counts whose sensitivity is bounded and whose noisy probabilities preserve downstream utility on arbitrary high-dimensional data. No evidence or discussion addresses the known dependence of LSH collision probabilities on data distribution and hash-family parameters, which is the precise point raised by the stress-test concern.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the abstract requires additional technical details to allow verification of the reported results and will revise it accordingly. We also agree to strengthen the discussion of LSH assumptions for the broad-applicability claim.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims (2.6% average accuracy drop at ε=5; MIA accuracy 53.60%) are stated without any description of LSH parameters (number of hashes, tables, or bucket width), noise mechanism (Laplace/Gaussian scale, sensitivity calculation), calibration procedure, or statistical tests/error bars. This directly prevents verification of the reported figures and is load-bearing for the central empirical claim.

Authors: We agree that the abstract must be self-contained for the central claims. In the revised version we will add a concise clause specifying the LSH configuration (number of hash functions, tables, and bucket width), the noise mechanism (Laplace noise scaled to the sensitivity of per-bucket vote counts), the calibration procedure, and that error bars reflect standard deviation over repeated trials. These parameters are already fixed and reported in the experimental setup; the revision will simply surface them in the abstract. revision: yes
Referee: [Abstract] Abstract: the assertion that the method is 'broadly applicable to any pipeline requiring secure key,value datastore creation' rests on the unexamined assumption that untuned LSH buckets yield vote counts whose sensitivity is bounded and whose noisy probabilities preserve downstream utility on arbitrary high-dimensional data. No evidence or discussion addresses the known dependence of LSH collision probabilities on data distribution and hash-family parameters, which is the precise point raised by the stress-test concern.

Authors: Our experiments cover seven datasets that vary in dimensionality, sample size, and number of classes (2–14), providing empirical support for utility preservation under the chosen LSH parameters. Nevertheless, we accept that an explicit discussion of LSH collision-probability dependence on data distribution is warranted. We will add a short paragraph in the Discussion section that (i) recalls the standard LSH collision bounds, (ii) states the parameter-selection heuristic used, and (iii) notes the empirical stability observed across the tested distributions. The broad-applicability phrasing will be qualified to reflect this scope. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical method with no derivations or self-referential predictions

full rationale

The paper describes an LSH-based bucketing approach followed by addition of calibrated DP noise to per-bucket vote counts, then evaluates the resulting datastore empirically on seven classification datasets. No equations, derivations, or 'predictions' appear in the provided text. The reported 2.6% accuracy drop and MIA numbers are direct experimental outcomes rather than quantities forced by construction from fitted parameters or self-citations. The central claim therefore rests on external benchmarks and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach implicitly relies on standard properties of LSH and DP mechanisms whose details are not stated.

pith-pipeline@v0.9.1-grok · 5712 in / 1037 out tokens · 26404 ms · 2026-06-28T16:37:24.259439+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 4 canonical work pages · 1 internal anchor

[1]

In:IEEE Symposium on Security and Privacy (S&P), pp

Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership Inference Attacks Against Machine Learning Models. In:IEEE Symposium on Security and Privacy (S&P), pp. 3–18 (2017)

2017
[2]

In:ICLR(2024)

Tang, X., Shin, R., Inan, H.A., Manoel, A., Mireshghallah, F., Lin, Z., Gopi, S., et al.: Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation. In:ICLR(2024)

2024
[3]

In:ICLR(2020)

Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L., Lewis, M.: Generalization through Memorization: Nearest Neighbor Language Models. In:ICLR(2020)

2020
[4]

In:ICLR(2024)

Wu, T., Panda, A., Wang, J.T., Mittal, P.: Privacy-Preserving In-Context Learning for Large Language Models. In:ICLR(2024)

2024
[5]

In:NeurIPS, vol

Zhang, X., Zhao, J., LeCun, Y.: Character-level Convolutional Networks for Text Classification. In:NeurIPS, vol. 28 (2015)

2015
[6]

Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In:Proc. ACM SIGKDD, pp. 168–177 (2004)

2004
[7]

In: Proc

Voorhees, E.M., Tice, D.M.: Building a Question Answering Test Collection. In: Proc. ACM SIGIR, pp. 200–207 (2000)

2000
[8]

Pang, B., Lee, L.: Seeing Stars: Exploiting Class Relationships for Sentiment Cat- egorization with Respect to Rating Scales. In:Proc. ACL, pp. 115–124 (2005)

2005
[9]

Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In:Proc. ACL, pp. 271–278 (2004)

2004
[10]

In:CVPR, pp

Zhu, Y., Yu, X., Chandraker, M., Wang, Y.-X.: Private-kNN: Practical Differential Privacy for Computer Vision. In:CVPR, pp. 11851–11859 (2020)

2020
[11]

In:ICLR(2023)

Xu, B., Xie, Y., Gu, Z., Zhu, K., Peng, S.: kNN Prompting: Beyond-Context Learn- ing with Calibration-Free Nearest Neighbor Inference. In:ICLR(2023)

2023
[12]

arXiv preprint arXiv:2302.12188 (2023)

Dai, Y., Zhang, Z., Liu, Q., Cui, Q., Li, W., Du, Y., Xu, T.: Simple and Scalable Nearest Neighbor Machine Translation. arXiv preprint arXiv:2302.12188 (2023)

work page arXiv 2023
[13]

In:ICLR(2021)

Khandelwal, U., Fan, A., Jurafsky, D., Zettlemoyer, L., Lewis, M.: Nearest Neigh- bor Machine Translation. In:ICLR(2021)

2021
[14]

165–210 (2005)

Wiebe, J., Wilson, T., Cardie, C.: Annotating Expressions of Opinions and Emo- tions in Language.Language Resources and Evaluation, 39(2–3), pp. 165–210 (2005)

2005
[15]

In:NeurIPS, pp

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., et al.: Language Models are Few-Shot Learners. In:NeurIPS, pp. 1877–1901 (2020) 14 Abdelrahman Abouelenein and Marwan Torki

1901
[16]

OpenAI Technical Report (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. OpenAI Technical Report (2019)

2019
[17]

In:Findings of the Association for Computational Linguis- tics, pp

Igamberdiev, T., Habernal, I.: DP-BART for Privatized Text Rewriting under Lo- cal Differential Privacy. In:Findings of the Association for Computational Linguis- tics, pp. 13914–13934 (2023)

2023
[18]

arXiv preprint arXiv:2303.00654 (2023)

Ponomareva, N., Kurakin, A., Chien, S., Thakurta, A., Matthews, P.: How to DP- fy ML: A Practical Guide to Machine Learning with Differential Privacy. arXiv preprint arXiv:2303.00654 (2023)

work page arXiv 2023
[19]

arXiv preprint arXiv:2503.10677 (2025)

Cheng, M., Zhao, W.X., Zhang, J., Wen, J.-R.: A Survey on Knowledge-Oriented Retrieval-Augmented Generation. arXiv preprint arXiv:2503.10677 (2025)

work page arXiv 2025
[20]

In:COLING, pp

Igamberdiev, T., Habernal, I.: DP-Rewrite: Towards Reproducibility and Trans- parency in Differentially Private Text Rewriting. In:COLING, pp. 2927–2933 (2022)

2022
[21]

In:ACM CCS(2016)

Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep Learning with Differential Privacy. In:ACM CCS(2016)

2016
[22]

In:STOC, pp

Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Toward Removing the Curse of Dimensionality. In:STOC, pp. 604–613 (1998)

1998
[23]

211–407 (2014)

Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy.Foun- dations and Trends in Theoretical Computer Science, 9(3–4), pp. 211–407 (2014)

2014
[24]

In:NeurIPS(2023)

Andoni, A., Dadush, D., Klein, N., Liu, K., Zhang, L.: Differentially Private Ap- proximate Near Neighbor Counting in High Dimensions. In:NeurIPS(2023)

2023
[25]

In:Symposium on Founda- tions of Responsible Computing(2025)

Aumüller, M., Gollapudi, S., Pagh, R., Silvestri, F.: Differentially Private High- Dimensional Approximate Range Counting, Revisited. In:Symposium on Founda- tions of Responsible Computing(2025)

2025
[26]

In:ACM SIGKDD(2012)

Kenthapadi, K., Korolova, A., Mironov, I., Mishra, N.: Differential Privacy with Locality-Sensitive Hashing. In:ACM SIGKDD(2012)

2012
[27]

In:ICALP(2011)

Chan, T.-H.H., Li, M., Shi, E., Xu, W.: Differentially Private Approximate Range Counting in High Dimensions. In:ICALP(2011)

2011
[28]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin, A., Ashfaq, A., Atkinson, A., Awadalla, H., Bach, N., et al.: Phi-4- Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. arXiv preprint arXiv:2503.01743 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

and Basin, D.: Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google’s FLoC and the MinHash Hierarchy System

Turati, F., Kubicek, K., Cotrini, C. and Basin, D.: Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google’s FLoC and the MinHash Hierarchy System. In:Proceedings on Privacy Enhancing Technologies, 2023(4), pp. 117–131 (2023)

2023
[30]

In:Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp

Torki, M.: A Document Descriptor using Covariance of Word Vectors. In:Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 527–532 (2018)

2018

[1] [1]

In:IEEE Symposium on Security and Privacy (S&P), pp

Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership Inference Attacks Against Machine Learning Models. In:IEEE Symposium on Security and Privacy (S&P), pp. 3–18 (2017)

2017

[2] [2]

In:ICLR(2024)

Tang, X., Shin, R., Inan, H.A., Manoel, A., Mireshghallah, F., Lin, Z., Gopi, S., et al.: Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation. In:ICLR(2024)

2024

[3] [3]

In:ICLR(2020)

Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L., Lewis, M.: Generalization through Memorization: Nearest Neighbor Language Models. In:ICLR(2020)

2020

[4] [4]

In:ICLR(2024)

Wu, T., Panda, A., Wang, J.T., Mittal, P.: Privacy-Preserving In-Context Learning for Large Language Models. In:ICLR(2024)

2024

[5] [5]

In:NeurIPS, vol

Zhang, X., Zhao, J., LeCun, Y.: Character-level Convolutional Networks for Text Classification. In:NeurIPS, vol. 28 (2015)

2015

[6] [6]

Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In:Proc. ACM SIGKDD, pp. 168–177 (2004)

2004

[7] [7]

In: Proc

Voorhees, E.M., Tice, D.M.: Building a Question Answering Test Collection. In: Proc. ACM SIGIR, pp. 200–207 (2000)

2000

[8] [8]

Pang, B., Lee, L.: Seeing Stars: Exploiting Class Relationships for Sentiment Cat- egorization with Respect to Rating Scales. In:Proc. ACL, pp. 115–124 (2005)

2005

[9] [9]

Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In:Proc. ACL, pp. 271–278 (2004)

2004

[10] [10]

In:CVPR, pp

Zhu, Y., Yu, X., Chandraker, M., Wang, Y.-X.: Private-kNN: Practical Differential Privacy for Computer Vision. In:CVPR, pp. 11851–11859 (2020)

2020

[11] [11]

In:ICLR(2023)

Xu, B., Xie, Y., Gu, Z., Zhu, K., Peng, S.: kNN Prompting: Beyond-Context Learn- ing with Calibration-Free Nearest Neighbor Inference. In:ICLR(2023)

2023

[12] [12]

arXiv preprint arXiv:2302.12188 (2023)

Dai, Y., Zhang, Z., Liu, Q., Cui, Q., Li, W., Du, Y., Xu, T.: Simple and Scalable Nearest Neighbor Machine Translation. arXiv preprint arXiv:2302.12188 (2023)

work page arXiv 2023

[13] [13]

In:ICLR(2021)

Khandelwal, U., Fan, A., Jurafsky, D., Zettlemoyer, L., Lewis, M.: Nearest Neigh- bor Machine Translation. In:ICLR(2021)

2021

[14] [14]

165–210 (2005)

Wiebe, J., Wilson, T., Cardie, C.: Annotating Expressions of Opinions and Emo- tions in Language.Language Resources and Evaluation, 39(2–3), pp. 165–210 (2005)

2005

[15] [15]

In:NeurIPS, pp

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., et al.: Language Models are Few-Shot Learners. In:NeurIPS, pp. 1877–1901 (2020) 14 Abdelrahman Abouelenein and Marwan Torki

1901

[16] [16]

OpenAI Technical Report (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. OpenAI Technical Report (2019)

2019

[17] [17]

In:Findings of the Association for Computational Linguis- tics, pp

Igamberdiev, T., Habernal, I.: DP-BART for Privatized Text Rewriting under Lo- cal Differential Privacy. In:Findings of the Association for Computational Linguis- tics, pp. 13914–13934 (2023)

2023

[18] [18]

arXiv preprint arXiv:2303.00654 (2023)

Ponomareva, N., Kurakin, A., Chien, S., Thakurta, A., Matthews, P.: How to DP- fy ML: A Practical Guide to Machine Learning with Differential Privacy. arXiv preprint arXiv:2303.00654 (2023)

work page arXiv 2023

[19] [19]

arXiv preprint arXiv:2503.10677 (2025)

Cheng, M., Zhao, W.X., Zhang, J., Wen, J.-R.: A Survey on Knowledge-Oriented Retrieval-Augmented Generation. arXiv preprint arXiv:2503.10677 (2025)

work page arXiv 2025

[20] [20]

In:COLING, pp

Igamberdiev, T., Habernal, I.: DP-Rewrite: Towards Reproducibility and Trans- parency in Differentially Private Text Rewriting. In:COLING, pp. 2927–2933 (2022)

2022

[21] [21]

In:ACM CCS(2016)

Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep Learning with Differential Privacy. In:ACM CCS(2016)

2016

[22] [22]

In:STOC, pp

Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Toward Removing the Curse of Dimensionality. In:STOC, pp. 604–613 (1998)

1998

[23] [23]

211–407 (2014)

Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy.Foun- dations and Trends in Theoretical Computer Science, 9(3–4), pp. 211–407 (2014)

2014

[24] [24]

In:NeurIPS(2023)

Andoni, A., Dadush, D., Klein, N., Liu, K., Zhang, L.: Differentially Private Ap- proximate Near Neighbor Counting in High Dimensions. In:NeurIPS(2023)

2023

[25] [25]

In:Symposium on Founda- tions of Responsible Computing(2025)

Aumüller, M., Gollapudi, S., Pagh, R., Silvestri, F.: Differentially Private High- Dimensional Approximate Range Counting, Revisited. In:Symposium on Founda- tions of Responsible Computing(2025)

2025

[26] [26]

In:ACM SIGKDD(2012)

Kenthapadi, K., Korolova, A., Mironov, I., Mishra, N.: Differential Privacy with Locality-Sensitive Hashing. In:ACM SIGKDD(2012)

2012

[27] [27]

In:ICALP(2011)

Chan, T.-H.H., Li, M., Shi, E., Xu, W.: Differentially Private Approximate Range Counting in High Dimensions. In:ICALP(2011)

2011

[28] [28]

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abouelenin, A., Ashfaq, A., Atkinson, A., Awadalla, H., Bach, N., et al.: Phi-4- Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs. arXiv preprint arXiv:2503.01743 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

and Basin, D.: Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google’s FLoC and the MinHash Hierarchy System

Turati, F., Kubicek, K., Cotrini, C. and Basin, D.: Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google’s FLoC and the MinHash Hierarchy System. In:Proceedings on Privacy Enhancing Technologies, 2023(4), pp. 117–131 (2023)

2023

[30] [30]

In:Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp

Torki, M.: A Document Descriptor using Covariance of Word Vectors. In:Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 527–532 (2018)

2018