pith. machine review for the scientific record. sign in

arxiv: 2605.02977 · v1 · submitted 2026-05-03 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

Contrastive Privacy: A Semantic Approach to Measuring Privacy of AI-based Sanitization

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:25 UTC · model grok-4.3

classification 💻 cs.CR
keywords contrastive privacyAI sanitizationsemantic privacy measurementCLIP embeddingsconcept removalimage and text privacylabel-free evaluation
0
0 comments X

The pith

Contrastive privacy defines a formal semantic test for whether AI sanitization has removed target concepts from media.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces contrastive privacy as a quantitative definition that checks if specific concepts have been removed from images or text after AI sanitization. It works by using models like CLIP to compare each sanitized item against other samples from the same collection, determining concept presence through latent semantic connections rather than direct labels. This approach operates independently of the sanitization method or the model that performed it and applies to both images and text. A sympathetic reader would care because existing AI sanitizers lack any systematic way to verify their privacy claims, leaving evaluations ad hoc. If the definition holds, it supplies a consistent way to measure and compare how well different sanitizers succeed at concept removal.

Core claim

Contrastive privacy provides a formal definition of privacy that supplies a systematic and quantitative test of sanitized media with a semantic interpretation. It is independent of the model and mechanism used and operates across multiple media modalities. The definition supplies guarantees under ideal conditions, and the paper shows how to operationalize it with imperfect semantic measures from models like CLIP that connect concepts latently. The algorithm contrasts sanitized media with other items from the same corpus to reach a determination, without requiring any manual labeling.

What carries the argument

Contrastive privacy, a formal definition that tests sanitized media by contrasting it with other corpus samples using latent semantic embeddings from models like CLIP to detect remaining target concepts without manual labels.

If this is right

  • The test can quantify overall sanitization success across 34 combinations of models on images and 15 on text.
  • It can pinpoint specific failures within a sanitized collection rather than only giving an aggregate score.
  • The evaluation remains valid regardless of whether sanitization uses pixel darkening, blurring, or more advanced obfuscation techniques.
  • The same procedure applies equally to image and text data using current frontier models for both concept generation and sanitization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The corpus-contrast method might produce distribution-dependent results if the collection itself lacks diversity in the target concepts.
  • Extending the approach to video would require semantic models that can compare temporal sequences rather than static frames.
  • Regulators could adopt the definition as a required benchmark for commercial AI sanitization tools.

Load-bearing premise

That semantic similarity measures from models like CLIP reliably indicate whether the target concepts to be sanitized are still present or absent after processing.

What would settle it

A test set where human judges confirm that a target concept remains in some sanitized items, yet the contrastive procedure reports those items as fully sanitized, or the reverse.

Figures

Figures reproduced from arXiv: 2605.02977 by Brian Neil Levine, Eugene Bagdasarian, George Bissias.

Figure 1
Figure 1. Figure 1: Contrastive privacy versus utility (as similarity) for view at source ↗
Figure 2
Figure 2. Figure 2: Contrastive privacy measures success via In view at source ↗
Figure 3
Figure 3. Figure 3: A sanitized image of John Kerry and Leonardo view at source ↗
Figure 4
Figure 4. Figure 4: (left) 30 images capturing Leonard DiCaprio sanitized of view at source ↗
Figure 5
Figure 5. Figure 5: Privacy resolution and utility for 30 images captur view at source ↗
Figure 7
Figure 7. Figure 7: Privacy resolution (red) and utility (blue) for 49 view at source ↗
Figure 8
Figure 8. Figure 8: The most significant privacy failure for mechanism view at source ↗
Figure 10
Figure 10. Figure 10: An attempt to sanitize the concept the identity of the fast food restaurant from a collection of nine images capturing the brand McDonald’s. (a) Unaltered images capturing the McDonald’s brand. (b) Images with Concepts 5 obfuscated by iGPT15 achieving con￾trastive privacy at resolution 0.17 view at source ↗
Figure 12
Figure 12. Figure 12: A privatized collection of 49 images depicting the McDonald’s restaurant. Resolutions are calculated using the view at source ↗
Figure 13
Figure 13. Figure 13: The most significant privacy failures for 30 images capturing Leonardo DiCaprio and sanitizing the abstract concept view at source ↗
Figure 14
Figure 14. Figure 14: The most significant privacy failures for 49 images capturing the McDonald’s brand and sanitizing the abstract concept view at source ↗
read the original abstract

To sanitize specific concepts from imagery and text, privacy mechanisms with formal guarantees are often eschewed in practice in favor of more intuitive techniques. AI-based sanitization is poised to grow in popularity because it can work with the semantics of natural language concepts; e.g., a prompt to "remove faces, clothing, and body shape". Many approaches exist commercially and as prior work. But, the evaluation of such approaches has been bespoke and without formal guarantees. To fill this gap, we propose contrastive privacy, a formal definition of privacy that provides a systematic and quantitative test of sanitized media that has a semantic interpretation. It is independent of the model and mechanism used and operates across multiple media modalities. Contrastive privacy provides guarantees under ideal conditions; and we show how to operationalize the definition with imperfect measures of semantics, provided by models like CLIP, that can connect concepts latently. Notably, the algorithm contrasts sanitized media with other images from the same corpus to arrive at a determination; no manual labeling is involved. In our experiments, we apply our privacy test to both images and text using frontier models: some generate concepts to sanitize and others perform the sanitization. With our test we quantify sanitization success across 34 combinations of models on images, and for 15 models on text. The approach not only quantifies success overall, it identifies specific failures from a sanitized corpus. Further, it is independent of the mechanism used for sanitization, whether by darkening pixels, blurring, or applying more advanced means of obfuscation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes contrastive privacy, a formal semantic definition providing a quantitative, model- and mechanism-independent test for privacy in AI-based sanitization of specific concepts from images and text. It operationalizes the definition using latent embeddings (e.g., from CLIP) by contrasting sanitized media against other items in the same corpus, avoiding manual labeling. Experiments apply the test to 34 image model combinations and 15 text model combinations, claiming to quantify sanitization success overall while identifying specific failures, with guarantees under ideal conditions and applicability across modalities like pixel darkening or advanced obfuscation.

Significance. If the central claims hold, the work offers a promising standardized framework for evaluating semantic privacy in AI sanitization, moving beyond bespoke methods. It credits the model-independence, cross-modal scope, and scale of the 34+15 model evaluations as strengths for practical adoption in privacy assessment.

major comments (2)
  1. [Abstract] Abstract and definition section: the claim of formal guarantees under ideal conditions and independence from the sanitization mechanism lacks any derivation steps, proof sketches, or quantitative backing in the provided outline, which is load-bearing for verifying the semantic interpretation and quantitative test.
  2. [Experiments] Operationalization and experiments: contrasting sanitized items against the same corpus (as described for the algorithm) risks distribution-dependent artifacts and false negatives when the corpus lacks diverse negative examples for target concepts; this directly challenges the claimed mechanism independence and reliability of the privacy determination across the 34 image and 15 text combinations.
minor comments (2)
  1. [Method] Clarify the precise computation of the contrast statistic or similarity threshold from embeddings to support reproducibility.
  2. [Discussion] Add discussion of sensitivity to corpus composition or alternative contrast strategies to address potential biases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential of contrastive privacy as a standardized framework. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and definition section: the claim of formal guarantees under ideal conditions and independence from the sanitization mechanism lacks any derivation steps, proof sketches, or quantitative backing in the provided outline, which is load-bearing for verifying the semantic interpretation and quantitative test.

    Authors: We agree that explicit derivation steps are necessary to substantiate the claims of formal guarantees and mechanism independence. The contrastive privacy definition is constructed so that, when the semantic measure is ideal (perfectly distinguishing the target concept), the privacy metric evaluates to zero precisely when the concept has been removed from the sanitized media, and this holds irrespective of the underlying sanitization mechanism since it operates solely on the resulting embeddings. In the revised version, we will add a proof sketch in the definition section demonstrating this property under ideal conditions, along with a discussion of how imperfect measures (such as those from CLIP) approximate it. This will also reinforce the model-independence of the test. revision: yes

  2. Referee: [Experiments] Operationalization and experiments: contrasting sanitized items against the same corpus (as described for the algorithm) risks distribution-dependent artifacts and false negatives when the corpus lacks diverse negative examples for target concepts; this directly challenges the claimed mechanism independence and reliability of the privacy determination across the 34 image and 15 text combinations.

    Authors: We acknowledge the potential for corpus-dependent artifacts in the operationalization. The test is intentionally relative to the corpus to avoid manual labeling, using other items as negative references for the target concept. Mechanism independence remains intact because the privacy assessment depends only on the semantic similarity of the sanitized output to corpus items, not on the specific sanitization technique (e.g., pixel darkening or advanced obfuscation). However, we recognize that insufficient diversity in negative examples could lead to unreliable determinations. In the revision, we will add a section discussing corpus requirements for reliable testing, describe the diversity of the datasets used in our 34 image and 15 text experiments, and note that practitioners should ensure representative corpora. This addresses the reliability concern without altering the core claims. revision: partial

Circularity Check

0 steps flagged

No circularity: contrastive privacy definition stands independently of its operationalization

full rationale

The paper proposes contrastive privacy as a new formal definition providing semantic guarantees under ideal conditions, then separately shows an operationalization using external models (e.g., CLIP) and corpus contrasts for label-free testing. No derivation step reduces the claimed result to its inputs by construction, no parameters are fitted then renamed as predictions, and no load-bearing claims rest on self-citations or imported uniqueness theorems. The corpus contrast is presented as an explicit methodological choice rather than a tautological loop, leaving the central definition self-contained against external semantic measures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; full derivation and parameter details unavailable. The approach rests on the domain assumption that latent models capture semantics sufficiently for privacy testing.

axioms (1)
  • domain assumption Latent semantic models such as CLIP can connect and measure natural language concepts in sanitized media
    Invoked to operationalize the formal definition with imperfect measures.

pith-pipeline@v0.9.0 · 5580 in / 1222 out tokens · 46541 ms · 2026-05-08T19:25:18.114359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 16 canonical work pages · 3 internal anchors

  1. [1]

    Abdulaziz, G

    S. Abdulaziz, G. D’amicantonio, and E. Bondarev. Evaluation of human visual privacy protection: Three- dimensional framework and benchmark dataset. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 5893–5902, 2025

  2. [2]

    Asiri, R

    S. Asiri, R. Alshehri, F. Kamran, H. Laznam, Y . Xiao, and S. Alzahrani. Spadr: A context-aware pipeline for privacy risk detection in text data.Electronics, 14(18): 3725, 2025

  3. [3]

    Aslam, A

    N. Aslam, A. Ray, J. B. Haurum, L. Esterle, and K. Nas- rollahi. From pixels to privacy: Temporally consis- tent video anonymization via token pruning for pri- vacy preserving action recognition.arXiv preprint arXiv:2603.26336, 2026

  4. [4]

    Barattin, C

    S. Barattin, C. Tzelepis, I. Patras, and N. Sebe. Attribute- preserving face dataset anonymization via latent code optimization. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 8001–8010, 2023

  5. [5]

    Baumgartner, S

    J. Baumgartner, S. Zannettou, B. Keegan, M. Squire, and J. Blackburn. The pushshift reddit dataset. InProceed- ings of the international AAAI conference on web and social media, volume 14, pages 830–839, 2020

  6. [6]

    Thousands of epstein documents taken down after victims identified, 2026

    BBC News. Thousands of epstein documents taken down after victims identified, 2026. https://www.bb c.com/news/articles/cn0k65pnxjxo . Accessed: 2026-03-06

  7. [7]

    Bibbo, A

    G. Bibbo, A. Singh, T. Deacon, and M. D. Plumbley. Speech removal framework for privacy-preserving audio recordings. InProceedings of the IEEE Workshop on Ap- plications of Signal Processing to Audio and Acoustics (WASPAA), 2025

  8. [8]

    J. J. Bon, J. Bailie, J. Rousseau, and C. P. Robert. Per- suasive privacy.arXiv preprint arXiv:2601.22945, 2026

  9. [9]

    Carlini, F

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650, 2021

  10. [10]

    Chandrasekaran, C

    V . Chandrasekaran, C. Gao, B. Tang, K. Fawaz, S. Jha, and S. Banerjee. Face-off: Adversarial face obfusca- tion.Proceedings on Privacy Enhancing Technologies (PoPETs), 2021(2):369–390, 2021. doi: 10.2478/popets -2021-0032

  11. [11]

    Chang, M

    H. Chang, M. Chen, H. Wang, and Q. Zhang. Privacy preserving semantic communications using vision lan- guage models: A segmentation and generation approach. InMILCOM 2025-2025 IEEE Military Communications Conference (MILCOM), pages 1365–1370. IEEE, 2025

  12. [12]

    Chen, L.-J

    J.-W. Chen, L.-J. Chen, C.-M. Yu, and C.-S. Lu. Per- ceptual indistinguishability-net (pi-net): Facial image obfuscation with manipulable semantics. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6478–6487, 2021

  13. [13]

    T. Chen, P. Li, K. Zhou, T. Chen, and H. Wei. Vision lan- guage model helps private information de-identification in vision data. InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 4558–4572, 2025

  14. [14]

    Cummings and J

    R. Cummings and J. Sarathy. Centering policy and practice: Research gaps around usable differential pri- vacy. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), pages 122–135, 2023. doi: 10.1109/TPS-ISA58951.2023.00024

  15. [15]

    Deuber, M

    D. Deuber, M. Keuchen, and N. Christin. Assessing anonymity techniques employed in german court deci- sions: A {De-Anonymization} experiment. In32nd USENIX Security Symposium (USENIX Security 23), pages 5199–5216, 2023

  16. [16]

    Dinh and S

    M. Dinh and S. Jin. Unsafe2safe: Controllable image anonymization for downstream utility.arXiv preprint arXiv:2603.28605, 2026

  17. [17]

    The Faiss library

    M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou. The faiss library.arXiv preprint arXiv:2401.08281, 2024

  18. [18]

    Dwork and A

    C. Dwork and A. Roth. The algorithmic foundations of differential privacy.Foundations and trends® in theo- retical computer science, 9(3-4):211–487, 2014

  19. [19]

    M. B. Ertan, R. Sahu, P. H. Nguyen, K. Mahmood, and M. van Dijk. Beyond anonymization: Object scrubbing for privacy-preserving 2d and 3d vision tasks.arXiv preprint arXiv:2504.16557, 2025

  20. [20]

    S. L. Garfinkel. De-identification of personal informa- tion. Technical report, National Institute of Standards and Technology, October 2015. 13

  21. [21]

    G. M. Garrido, X. Liu, F. Matthes, and D. Song. Lessons learned: Surveying the practicality of differential privacy in the industry.Proceedings on Privacy Enhancing Technologies, 2023

  22. [22]

    Garza, A

    L. Garza, A. Kotal, A. Piplai, L. Elluri, P. K. Das, and A. Chadha. Prvl: Quantifying the capabilities and risks of large language models for pii redaction. In2025 IEEE 7th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), pages 148–159. IEEE, 2025

  23. [23]

    Ghosh, F

    S. Ghosh, F. Dreyer, T. Thiele, F. Lorbeer, and S. Stober. Improving voice quality in speech anonymization with just perception-informed losses. InAudio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation, 2024

  24. [24]

    J. He, Y . Lu, Z. Lin, K. Chen, and Y . Zhao. {PrivacyXray}: Detecting privacy breaches in {LLMs} through semantic consistency and probability certainty. In34th USENIX Security Symposium (USENIX Security 25), pages 8195–8214, 2025

  25. [25]

    H. Hu, Y . Luan, Y . Chen, U. Khandelwal, M. Joshi, K. Lee, K. Toutanova, and M.-W. Chang. Open-domain visual entity recognition: Towards recognizing millions of wikipedia entities. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12065–12075, 2023

  26. [26]

    J. Huh, J. Kang, J. Woo, and S. Lee. A novel intelli- gent video surveillance system using low-traffic scene- preserving video anonymization.ACM Transactions on Intelligent Systems and Technology, 16(2):1–24, 2025

  27. [27]

    Hukkelås and F

    H. Hukkelås and F. Lindseth. Deepprivacy2: Towards realistic full-body anonymization. InProceedings of the IEEE/CVF winter conference on applications of com- puter vision, pages 1329–1338, 2023

  28. [28]

    Hukkelås, R

    H. Hukkelås, R. Mester, and F. Lindseth. Deepprivacy: A generative adversarial network for face anonymization. InInternational symposium on visual computing, pages 565–578. Springer, 2019

  29. [29]

    P. Ilia, I. Polakis, E. Athanasopoulos, F. Maggi, and S. Ioannidis. Face/off: Preventing privacy leakage from photos in social networks. InProceedings of the 22nd ACM SIGSAC Conference on computer and communica- tions security, pages 781–792, USA, 2015. ACM

  30. [30]

    H.-W. Kung, T. Varanka, S. Saha, T. Sim, and N. Sebe. Face anonymization made simple. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1040–1050. IEEE, 2025

  31. [31]

    Le and N

    M.-H. Le and N. Carlsson. DiffPrivate: Facial privacy protection with diffusion models.Proceedings on Pri- vacy Enhancing Technologies (PoPETs), 2025(2):1–24,

  32. [32]

    To appear

    doi: 10.56553/popets-2025-0049. To appear

  33. [33]

    M. Li, X. Xu, H. Fan, P. Zhou, J. Liu, J.-W. Liu, J. Li, J. Keppo, M. Z. Shou, and S. Yan. Stprivacy: Spatio- temporal privacy-preserving action recognition. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 5106–5115, 2023

  34. [34]

    N. Li, W. Qardaji, D. Su, Y . Wu, and W. Yang. Mem- bership privacy: A unifying framework for privacy defi- nitions. InProceedings of the 2013 ACM SIGSAC con- ference on Computer & communications security, pages 889–900, USA, 2013. ACM

  35. [35]

    Q. Li, Y . Zhang, J. Ren, Q. Li, and Y . Zhang. You can use but cannot recognize: Preserving visual privacy in deep neural networks. InProceedings of the 2024 Network and Distributed System Security Symposium (NDSS), USA, 2024. ISOC. URL https://www.ndss-sympo sium.org/wp-content/uploads/2024-1361-paper .pdf

  36. [36]

    Li and C

    T. Li and C. Clifton. Differentially private imaging via latent space manipulation, 2021. URL https://arxiv. org/abs/2103.05472

  37. [37]

    J. Lim, K. Kim, H. Yu, and S.-B. Lee. Overo: Sharing private audio recordings. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 1933–1946, USA, 2022. ACM

  38. [38]

    R. Lin. Police body worn cameras and privacy: Retain- ing benefits while reducing public concerns.Duke L. & Tech. Rev., 14:346, 2015

  39. [39]

    Y . A. Malkov. Efficient and robust approximate near- est neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018

  40. [40]

    S. Malm, V . Rönnbäck, A. Håkansson, M.-H. Le, K. Woj- tulewicz, and N. Carlsson. Rad: Realistic anonymization of images using stable diffusion. InProceedings of the 23rd Workshop on Privacy in the Electronic Society, pages 193–211, 2023

  41. [41]

    Maris, Y

    L. Maris, Y . Matsuda, and K. Yasumoto. Differential privacy and k-anonymity for pedestrian image data: Im- pact on cross-camera person re-identification and de- mographic predictions.ACM Transactions on Cyber- Physical Systems, 9(4):1–31, 2025

  42. [42]

    Maximov, I

    M. Maximov, I. Elezi, and L. Leal-Taixé. Ciagan: Con- ditional identity anonymization generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5447– 5456, 2020

  43. [43]

    Monteiro, Y

    K. Monteiro, Y . Wu, and S. Das. Imago obscura: An image privacy ai co-pilot to enable identification and mit- igation of risks. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, pages 1–26, USA, 2025. ACM. 14

  44. [44]

    Nissenbaum

    H. Nissenbaum. Privacy as contextual integrity.Wash. L. Rev., 79:119, 2004

  45. [45]

    S. Pang, Z. Lu, H. Wang, P. Fu, Y . Zhou, and M. Xue. Re- construction of differentially private text sanitization via large language models. In2025 28th International Sym- posium on Research in Attacks, Intrusions and Defenses (RAID), pages 1–17. IEEE, 2025

  46. [46]

    Patwari, C.-N

    K. Patwari, C.-N. Chuah, L. Lyu, and V . Sharma. Per- ceptanon: Exploring the human perception of image anonymization beyond pseudonymization for gdpr. In Forty-first International Conference on Machine Learn- ing, 2024

  47. [47]

    Paudel, B

    B. Paudel, B. Mandal, G. Amariucai, and S. Wei. Sani- tization or deception? rethinking privacy protection in large language models.Proceedings on Privacy Enhanc- ing Technologies, 2026

  48. [48]

    J. M. Perero-Codosero, F. M. Espinoza-Cuadros, and L. A. Hernández-Gómez. X-vector anonymization us- ing autoencoders and adversarial training for preserv- ing speech privacy.Computer Speech & Language, 74: 101351, 2022

  49. [49]

    D. Pham, P. Kairouz, N. Mireshghallah, E. Bagdasarian, C. M. Pham, and A. Houmansadr. Can large language models really recognize your name?arXiv preprint arXiv:2505.14549, 2025

  50. [50]

    Pittaluga and B

    F. Pittaluga and B. Zhuang. Ldp-feat: Image features with local differential privacy. In2023 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 17534–17544. IEEE, 2023

  51. [51]

    Rajabi, R

    A. Rajabi, R. B. Bobba, M. Rosulek, C. V . Wright, and W.-c. Feng. On the (im)practicality of adversarial per- turbation for image privacy.Proceedings on Privacy En- hancing Technologies (PoPETs), 2021(1):85–106, 2021. doi: 10.2478/popets-2021-0006

  52. [52]

    Rosberg, E

    F. Rosberg, E. E. Aksoy, C. Englund, and F. Alonso- Fernandez. Fiva: facial image and video anonymiza- tion and anonymization defense. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 362–371, 2023

  53. [53]

    Roy Chowdhury, D

    A. Roy Chowdhury, D. Glukhov, D. Anshumaan, P. Cha- lasani, N. Papernot, S. Jha, and M. Bellare. Pr εεmpt: Sanitizing sensitive prompts for LLMs. InProceedings of the Network and Distributed System Security (NDSS) Symposium 2026, February 2026

  54. [54]

    M. J. Schneider, J. Bailie, and D. Iacobucci. Why data anonymization has not taken off.Customer Needs and Solutions, 12(1):9, 2025. doi: 10.1007/s40547-025-001 58-5. URL https://doi.org/10.1007/s40547-025 -00158-5

  55. [55]

    Body-worn video,

    Seattle Police Department. Body-worn video,

  56. [56]

    Accessed: 2026-03-06

    https://www.seattle.gov/police/about-us/issues- and-topics/body-worn-video. Accessed: 2026-03-06

  57. [57]

    Shoshitaishvili, C

    Y . Shoshitaishvili, C. Kruegel, and G. Vigna. Portrait of a privacy invasion: Detecting relationships through large- scale photo analysis.Proceedings on Privacy Enhancing Technologies (PoPETs), 2015(1):41–60, 2015. doi: 10.1 515/popets-2015-0004

  58. [58]

    Q. Sun, L. Ma, S. J. Oh, L. Van Gool, B. Schiele, and M. Fritz. Natural and effective obfuscation by head inpainting. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5050– 5059, 2018

  59. [59]

    Q. Sun, A. Tewari, W. Xu, M. Fritz, C. Theobalt, and B. Schiele. A hybrid model for identity obfuscation by face replacement. InProceedings of the European conference on computer vision (ECCV), pages 553–569, 2018

  60. [60]

    Department of Health and Human Ser- vices

    U.S. Department of Health and Human Ser- vices. Methods for de-identification of phi, 2026. https://www.hhs.gov/hipaa/for-professionals/special- topics/de-identification/index.html. Accessed: 2026-03- 06

  61. [61]

    Y . Wang, Z. Zhang, W. Zhou, W. Zhang, J. Zhang, Q. Zhu, Y . Shi, S. Zheng, and J. He. Guiguard: Toward a general framework for privacy-preserving gui agents. arXiv preprint arXiv:2601.18842, 2026

  62. [62]

    L. Wu, F. Petroni, M. Josifoski, S. Riedel, and L. Zettle- moyer. Scalable zero-shot entity linking with dense entity retrieval. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 6397–6407, 2020

  63. [63]

    R. Xin, N. Mireshghallah, S. S. Li, M. Duan, H. Kim, Y . Choi, Y . Tsvetkov, S. Oh, and P. W. Koh. A false sense of privacy: Evaluating textual data sanitization beyond surface-level privacy leakage.arXiv preprint arXiv:2504.21035, 2025

  64. [64]

    H. Xue, B. Liu, M. Ding, T. Zhu, D. Ye, L. Song, and W. Zhou. Dp-image: Differential privacy for image data in feature space.arXiv preprint arXiv:2103.07073, 2021

  65. [65]

    H. Yan, X. Li, W. Zhang, Q. Chen, B. Wang, H. Li, and X. Lin. Coder: Protecting privacy in image retrieval with differential privacy.IEEE Transactions on Dependable and Secure Computing, 21(6):5420–5430, 2024

  66. [66]

    Zhao and J

    Y . Zhao and J. Chen. A survey on differential privacy for unstructured data content.ACM Computing Surveys (CSUR), 54(10s):1–28, 2022

  67. [67]

    Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models, 2024

    P. Zwick, K. Roesch, M. Klemp, and O. Bring- mann. Context-aware full body anonymization us- ing text-to-image diffusion models.arXiv preprint arXiv:2410.08551, 2024. 15 A Code Availability All artifacts needed to evaluate this paper’s core contributions are available in the following GitHub repository: https:// github.com/umass-forensics/contrastive-priv...