pith. sign in

arxiv: 2605.24307 · v1 · pith:4BBGU4JFnew · submitted 2026-05-23 · 💻 cs.HC · cs.CR· cs.CY

Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank

Pith reviewed 2026-06-30 13:09 UTC · model grok-4.3

classification 💻 cs.HC cs.CRcs.CY
keywords privacy preferencesGDPRmeasurement instrumentitem bankexpert validationdata protectionregulatory mechanismsuser survey
0
0 comments X

The pith

A 527-item bank derived from all 99 GDPR articles measures user preferences for specific regulatory privacy protections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a measurement instrument for privacy preferences grounded directly in GDPR text rather than general concerns. Existing tools predate the regulation and do not assess support for its concrete mechanisms such as data erasure or portability rights. The authors extract statements from every article, apply expert reviews and clustering to refine them, and produce an organized bank usable at different levels of detail. A sympathetic reader would care because the bank offers a way to check whether people actually value the protections that compliant systems must implement.

Core claim

By extracting 669 statements from the 99 articles of the GDPR and validating them through two rounds of expert review plus consensus voting by 50 specialists, the work yields a final 527-item bank organized into 9 parent themes and 73 subthemes. The items achieve mean pairwise expert agreement of approximately 85 percent on coverage of the regulation. This bank supplies a complementary dimension for measuring user preferences for regulatory mechanisms instead of abstract privacy concerns.

What carries the argument

The GPPI item bank, formed by extracting statements from GDPR articles and clustering them into expert-validated themes for use at varying granularities.

If this is right

  • The bank enables assessment of user valuation for concrete GDPR rights including data portability, erasure, and restrictions on automated decision-making.
  • Measurement can target broad parent themes or narrow subthemes depending on the needed level of detail.
  • Practitioners gain a tool to evaluate whether implemented privacy policies align with what users prefer under the regulation.
  • The structure supports repeated use across studies while maintaining direct ties to the full text of the 99 articles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bank could be administered alongside behavioral measures to check whether stated preferences predict actions like filing access requests.
  • Subsets of the items might help companies prioritize which GDPR features to emphasize in user interfaces based on theme-level scores.
  • The extraction and clustering process could be repeated for other privacy regulations to produce comparable aligned banks.

Load-bearing premise

Expert agreement that the statements accurately reflect GDPR content is sufficient to establish the items as valid measures of user privacy preferences.

What would settle it

A study that correlates scores on the item bank with users' actual exercise of GDPR rights, such as requesting data erasure or objecting to automated decisions, would test whether the items capture real preferences.

Figures

Figures reproduced from arXiv: 2605.24307 by Amirpouya Ghasemaghaei, Corey Pittman, David Mohaisen, Joseph J. LaViola Jr, Mykola Maslych, Trung Cuong Dang, Yahya Hmaiti.

Figure 1
Figure 1. Figure 1: GPPI complements legacy privacy-concern scales [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline for building GPPI. Starting from GDPR text, we created 669 statements (Stage 1). Two expert-review rounds [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Privacy measurement instruments (e.g., CFIP, IUIPC, PAQ) predate GDPR by over a decade and measure privacy concerns, distinct from preferences for regulatory protections (e.g., data portability, erasure, automated decision-making rights). This leaves practitioners without tools to assess whether users value the GDPR mechanisms implemented in compliant policies. We developed a GDPR-grounded privacy preference measurement item bank by extracting 669 statements from all 99 GDPR articles, validated by: (1) two-round expert review achieving full consensus on accuracy, (2) semantic clustering into 10 parent themes and 87 subthemes, and (3) consensus review with 50 privacy experts (5 per theme) using a larger or equal than 4/5 vote retention threshold. The final 527-item bank comprises 9 parent themes and 73 subthemes (18 to 112 items per parent theme, 1 to 29 per subtheme), enabling targeted measurement across granularities while covering GDPR at mean pairwise expert agreement of approx. 85%. This work introduces a complementary measurement dimension aligning user preferences with regulatory mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a GDPR-aligned Privacy Preference Item Bank (GPPI) by extracting 669 statements from all 99 GDPR articles. These were validated via a two-round expert review achieving full consensus on accuracy to the regulation, followed by semantic clustering into 10 parent themes and 87 subthemes, and a final consensus review by 50 privacy experts (5 per theme) using a ≥4/5 retention threshold. The resulting 527-item bank spans 9 parent themes and 73 subthemes (18–112 items per parent theme), with mean pairwise expert agreement of approximately 85%. The instrument is positioned as enabling targeted measurement of user preferences for specific GDPR mechanisms (e.g., data portability, erasure, automated decision-making), distinct from and complementary to existing concern scales such as IUIPC, CFIP, and PAQ.

Significance. If the central claim holds, the work supplies a systematically derived, regulation-grounded item bank that could support HCI and privacy researchers in assessing user valuation of concrete GDPR rights rather than abstract concerns. The multi-stage expert process and full coverage of the 99 articles represent a strength in systematic construction. However, the significance is constrained by the absence of any user-level validation data, which limits claims about its function as a preference measure.

major comments (2)
  1. [Abstract] Abstract: The claim that the 527-item bank 'enables targeted measurement' of user privacy preferences aligned with GDPR mechanisms rests entirely on expert consensus regarding fidelity to the regulatory text. No user data, factor analysis, reliability coefficients, behavioral correlations, or pilot testing with actual users are reported. Expert agreement on whether statements accurately reflect GDPR content is necessary but not sufficient to establish the items as valid measures of what users prefer or value.
  2. [Validation and Clustering sections] Validation and Clustering sections: The three-step process (two-round expert review, semantic clustering, 50-expert consensus with ≥4/5 retention) is described in detail, yet the manuscript provides no evidence that the retained items correlate with user preferences or behavior. This assumption is load-bearing for the instrument's stated purpose as a preference measurement tool rather than a GDPR paraphrase collection.
minor comments (2)
  1. [Abstract] Abstract: The phrasing 'larger or equal than 4/5 vote retention threshold' is grammatically awkward and should be revised to 'greater than or equal to 4/5'.
  2. [Abstract] Abstract: 'approx. 85%' should be written as 'approximately 85%' for formal consistency.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive review. The manuscript's core contribution is the systematic extraction and expert-validated construction of a GDPR-aligned item bank; we agree that this does not constitute psychometric validation of the items as user preference measures and will revise claims and add explicit scope limitations accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the 527-item bank 'enables targeted measurement' of user privacy preferences aligned with GDPR mechanisms rests entirely on expert consensus regarding fidelity to the regulatory text. No user data, factor analysis, reliability coefficients, behavioral correlations, or pilot testing with actual users are reported. Expert agreement on whether statements accurately reflect GDPR content is necessary but not sufficient to establish the items as valid measures of what users prefer or value.

    Authors: We agree that expert consensus establishes regulatory fidelity but is not sufficient to demonstrate that the items validly measure user preferences or values. The paper distinguishes the GPPI from concern scales by offering items tied to specific GDPR mechanisms, but does not report user data. We will revise the abstract to replace 'enables targeted measurement' with language indicating that the bank supplies GDPR-grounded items intended to support such measurement, while noting the need for future user validation. revision: yes

  2. Referee: [Validation and Clustering sections] Validation and Clustering sections: The three-step process (two-round expert review, semantic clustering, 50-expert consensus with ≥4/5 retention) is described in detail, yet the manuscript provides no evidence that the retained items correlate with user preferences or behavior. This assumption is load-bearing for the instrument's stated purpose as a preference measurement tool rather than a GDPR paraphrase collection.

    Authors: The described process validates accuracy to the GDPR text and produces a thematically organized bank, but we acknowledge that no evidence of correlation with user preferences or behavior is provided. The manuscript presents the bank as a regulation-derived resource rather than a fully validated psychometric instrument. We will revise the validation and discussion sections to explicitly state that user-level validation remains necessary and is outside the scope of the current work. revision: yes

standing simulated objections not resolved
  • Absence of user-level validation data (correlations, reliability, behavioral links), which cannot be supplied from the existing manuscript without new empirical studies.

Circularity Check

0 steps flagged

No circularity: derivation is extraction + external expert consensus from public GDPR text

full rationale

The paper's chain consists of (1) direct extraction of 669 statements from the 99 public GDPR articles, (2) two-round expert review for fidelity to the regulation, (3) semantic clustering into themes, and (4) 50-expert consensus retention. None of these steps invoke self-citations as load-bearing premises, fitted parameters renamed as predictions, self-definitional loops, or uniqueness theorems from the authors' prior work. The final 527-item bank is the direct output of this process; its claim to enable 'targeted measurement' rests on the described consensus procedure rather than any reduction to prior fitted values or internal definitions. External expert panels and the public regulatory text serve as independent benchmarks, so the work is self-contained with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that regulatory text extraction plus expert consensus produces a valid preference measurement tool; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Expert consensus via a >=4/5 vote threshold accurately validates extracted statements as representing GDPR content and user-relevant preferences.
    The retention rule is applied without reference to empirical user testing or behavioral correlation.

pith-pipeline@v0.9.1-grok · 5765 in / 1182 out tokens · 29283 ms · 2026-06-30T13:09:03.063552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

132 extracted references · 51 canonical work pages · 8 internal anchors

  1. [1]

    California Consumer Privacy Act

    2018. California Consumer Privacy Act. https://leginfo.legislature.ca.gov/faces/ billTextClient.xhtml?bill_id=201720180AB375

  2. [2]

    Tech- nical Report

    2019.Special Eurobarometer 487a: The General Data Protection Regulation. Tech- nical Report. European Commission, Brussels, Belgium. Public Opinion in the European Union; Directorate-General for Communication

  3. [3]

    California Privacy Rights Act

    2020. California Privacy Rights Act. https://vig.cdn.sos.ca.gov/2020/general/ pdf/topl-prop24.pdf

  4. [4]

    Alessandro Acquisti, Laura Brandimarte, and George Loewenstein. 2015. Privacy and Human Behavior in the Age of Information.Science347, 6221 (2015), 509–

  5. [5]

    doi:10.1126/science.aaa1465

  6. [6]

    1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health

    Michael Adler and Erio Ziglio. 1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health. Jessica Kingsley Publishers

  7. [7]

    Abdulrahman Alabduljabbar, Ahmed Abusnaina, Ülkü Meteriz-Yildiran, and David Mohaisen. 2021. TLDR: Deep Learning-Based Automated Privacy Policy Annotation with Key Policy Highlights. InProceedings of the Workshop on Privacy in the Electronic Society (WPES). 103–118. doi:10.1145/3463676.3485608

  8. [8]

    Lemi Baruh and Mihaela Popescu. 2017. Big Data Analytics and the Limits of Privacy Self-Management.New Media & Society19, 4 (2017), 579–596

  9. [9]

    Lemi Baruh, Ekin Secinti, and Zeynep Cemalcilar. 2017. Online privacy concerns and privacy management: A meta-analytical review.Journal of Communication 67, 1 (2017), 26–53

  10. [10]

    Becher and Uri Benoliel

    Shmuel I. Becher and Uri Benoliel. 2021. Law in Books and Law in Action: The Readability of Privacy Policies and the GDPR. InConsumer Law and Economics. 179–204

  11. [11]

    Merkouris, Nicki A

    Rimke Bijker, Stephanie S. Merkouris, Nicki A. Dowling, and Simone N. Rodda

  12. [12]

    doi:10.2196/59050

    ChatGPT for Automated Qualitative Research: Content Analysis.Journal of Medical Internet Research26 (2024), e59050. doi:10.2196/59050

  13. [13]

    Boateng, Torsten B

    Godfred O. Boateng, Torsten B. Neilands, Edward A. Frongillo, Hugo R. Melgar- Quiñonez, and Sera L. Young. 2018. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer.Frontiers in Public Health6 (2018), 149

  14. [14]

    Alex Bowyer, Jack Holt, Josephine Go Jefferies, Rob Wilson, David Kirk, and Jan David Smeddinck. 2022. Human-GDPR Interaction: Practical Experiences of Accessing Personal Data. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 106. doi:10.1145/3491102.3501947

  15. [15]

    2020.The Brussels Effect: How the European Union Rules the World

    Anu Bradford. 2020.The Brussels Effect: How the European Union Rules the World. Oxford University Press. doi:10.1093/oso/9780190088583.001.0001

  16. [16]

    Virginia Braun and Victoria Clarke. 2006. Using Thematic Analysis in Psychol- ogy.Qualitative Research in Psychology3, 2 (2006), 77–101

  17. [17]

    Kaplan, et al

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, et al . 2020. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/ file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

  18. [18]

    Joinson, and Ulf-Dietrich Reips

    Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips. 2006. Development of Measures of Online Privacy Concern and Protection for Use on the Internet.Journal of the American Society for Information Science and Technology58, 2 (2006), 157–165

  19. [19]

    Campbell and Donald W

    Donald T. Campbell and Donald W. Fiske. 1959. Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix.Psychological Bulletin56, 2 (1959), 81

  20. [20]

    Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al

  21. [21]

    Universal Sentence Encoder.arXiv preprint arXiv:1803.11175(2018)

  22. [22]

    Anupam Chander, Margot E Kaminski, and William McGeveran. 2020. Catalyz- ing privacy law.Minn. L. Rev.105 (2020), 1733. doi:10.2139/ssrn.3433922

  23. [23]

    Jessica Colnago, Lorrie Faith Cranor, and Alessandro Acquisti. 2023. Is There a Reverse Privacy Paradox? An Exploratory Analysis of Gaps Between Pri- vacy Perspectives and Privacy-Seeking Behaviors. InProceedings on Privacy Enhancing Technologies Symposium (PETS). 455–476

  24. [24]

    Jessica Colnago, Lorrie Faith Cranor, Alessandro Acquisti, and Kate Hazel Stan- ton. 2022. Is It a Concern or a Preference? An Investigation into the Ability of Privacy Scales to Capture and Distinguish Granular Privacy Constructs. In Proceedings of the Eighteenth Symposium on Usable Privacy and Security (SOUPS). 331–346

  25. [25]

    Crane, Dennis L

    Paul K. Crane, Dennis L. Hart, Laura E. Gibbons, and Karon F. Cook. 2006. A 37-Item Shoulder Functional Status Item Pool Had Negligible Differential Item Functioning.Journal of Clinical Epidemiology59, 5 (2006), 478–484. doi:10.1016/ j.jclinepi.2005.10.007

  26. [26]

    Hao Cui, Rahmadi Trimananda, Athina Markopoulou, and Scott Jordan. 2023. PoliGraph: Automated Privacy Policy Analysis Using Knowledge Graphs. In Proceedings of the USENIX Security Symposium (USENIX Security). 13

  27. [27]

    Matthias Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Florian Schaub, and Thorsten Holz. 2019. We Value Your Privacy. . . Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy. InProceedings of the 2019 Network and Distributed System Security Symposium (NDSS). doi:10.14722/ ndss.2019.23378

  28. [28]

    Norman K. Denzin. 2017.The Research Act: A Theoretical Introduction to Socio- logical Methods. Routledge

  29. [29]

    DeVellis and Carolyn T

    Robert F. DeVellis and Carolyn T. Thorpe. 2021.Scale Development: Theory and Applications. SAGE Publications

  30. [30]

    Right of Access

    Mariano Di Martino, Pieter Robyns, Winnie Weyts, Peter Quax, Wim Lamotte, and Ken Andries. 2019. Personal Information Leakage by Abusing the GDPR “Right of Access”. InProceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 371–385. https://www.usenix.org/conference/soups2019/ presentation/dimartino

  31. [31]

    Diamond, Robert C

    Ivan R. Diamond, Robert C. Grant, Brian M. Feldman, Paul B. Pencharz, Simon C. Ling, Aideen M. Moore, and Paul W. Wales. 2014. Defining Consensus: A Systematic Review Recommends Methodologic Criteria for Reporting of Delphi Studies.Journal of Clinical Epidemiology67, 4 (2014), 401–409

  32. [32]

    Tobias Dienlin and Sabine Trepte. 2015. Is the Privacy Paradox a Relic of the Past? An In-Depth Analysis of Privacy Attitudes and Privacy Behaviors. European Journal of Social Psychology45, 3 (2015), 285–297

  33. [33]

    Tamara Dinev and Paul Hart. 2006. An Extended Privacy Calculus Model for E-Commerce Transactions.Information Systems Research17, 1 (2006), 61–80. doi:10.1287/isre.1060.0080

  34. [34]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The FAISS Library.IEEE Transactions on Big Data(2025)

  35. [35]

    European Commission, Directorate-General for Justice and Consumers and Kantar. 2019. The General Data Protection Regulation: Special Eurobarometer 487a. doi:10.2838/43726

  36. [36]

    European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data (General Data Protection Regulation).Official Journal of the European UnionL119 (2016), 1–88. https://eur-lex.europa.eu/eli/r...

  37. [37]

    Uwe Flick. 2018. Triangulation in Data Collection

  38. [38]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-Augmented Gen- eration for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997 (2023)

  39. [39]

    Hana Habib and Lorrie Faith Cranor. 2022. Evaluating the Usability of Privacy Choice Mechanisms. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). 273–289. https://www.usenix.org/conference/soups2022/ presentation/habib

  40. [40]

    Okay, Whatever

    Hana Habib, Megan Li, Ellie Young, and Lorrie Cranor. 2022. “Okay, Whatever”: An Evaluation of Cookie Consent Interfaces. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 621. doi:10. 1145/3491102.3501985

  41. [41]

    It’s a Scavenger Hunt

    Hana Habib, Sarah Pearman, Jiamin Wang, Yixin Zou, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2020. “It’s a Scavenger Hunt”: Usability of Websites’ Opt-Out and Data Deletion Choices. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI). 1–12. doi:10.1145/3313831.3376511

  42. [42]

    Hana Habib, Yixin Zou, Aditi Jannu, Neha Sridhar, Chelse Swoopes, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2019. An Empirical Analysis of Data Deletion and Opt-Out Choices on 150 Websites. In Proceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 387–406. https://www.usenix.org/conference/soups201...

  43. [43]

    Shin, and Karl Aberer

    Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. 2018. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning. InProceedings of the USENIX Security Symposium (USENIX Security). 531–548

  44. [44]

    David Haynes and Lucy Robinson. 2021. A Delphi Study of Risks to Individuals Who Disclose Personal Information Online.Journal of Information Science47, 6 (2021), 792–808. doi:10.1177/0165551521992756

  45. [45]

    Timothy R. Hinkin. 1995. A Review of Scale Development Practices in the Study of Organizations.Journal of Management21, 5 (1995), 967–988

  46. [46]

    Weiyin Hong and James Y. L. Thong. 2013. Internet Privacy Concerns: An Integrated Conceptualization and Four Empirical Studies.MIS Quarterly(2013), 275–298

  47. [47]

    Henry Hosseini, Martin Degeling, Christine Utz, and Thomas Hupperich. 2021. Unifying Privacy Policy Detection.Proceedings on Privacy Enhancing Technolo- gies (PoPETs)4 (2021)

  48. [48]

    Sandford

    Chia-Chien Hsu and Brian A. Sandford. 2007. The Delphi Technique: Making Sense of Consensus.Practical Assessment, Research, and Evaluation12, 10 (2007), 1–8

  49. [49]

    Lawrence Hubert and Phipps Arabie. 1985. Comparing Partitions.Journal of Classification2, 1 (1985), 193–218

  50. [50]

    Jack Jamieson and Naomi Yamashita. 2023. Escaping the Walled Garden? User Perspectives of Control in Data Portability for Social Media.Proceedings of the ACM on Human-Computer Interaction (CSCW)7, 2, Article 339 (2023). doi:10. 1145/3610188

  51. [51]

    Carlos Jensen and Colin Potts. 2004. Privacy Policies as Decision-Making Tools: An Evaluation of Online Privacy Notices. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 471–478. doi:10.1145/985692.985752

  52. [52]

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-Scale Similarity Search with GPUs.arXiv preprint arXiv:1702.08734(2017). http://arxiv.org/abs/ 1702.08734

  53. [53]

    Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder

  54. [54]

    Nutrition Label

    A “Nutrition Label” for Privacy. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). doi:10.1145/1572532.1572538

  55. [55]

    Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F

    Patrick G. Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F. Cranor. 2010. Standardizing Privacy Notices: An Online Study of the Nutrition Label Approach. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 1573–1582. doi:10.1145/1753326.1753561

  56. [56]

    Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & security64 (2017), 122–134

  57. [57]

    2005.Privacy Indexes: A Survey of Westin’s Studies

    Ponnurangam Kumaraguru and Lorrie Faith Cranor. 2005.Privacy Indexes: A Survey of Westin’s Studies. Technical Report CMU-ISRI-5-138. Institute for Software Research International, School of Computer Science, Carnegie Mellon University

  58. [58]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics(1977), 159–174

  59. [59]

    LangChain AI. 2025. LangChain GitHub Repository. https://github.com/ langchain-ai/langchain

  60. [60]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al . 2020. Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS). 9459–9474. https://proceedings.neurips.cc/paper_ files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

  61. [61]

    Lucian Li. 2024. Tracing the Genealogies of Ideas with Sentence Embeddings. InProceedings of the International Conference on Natural Language Processing for Digital Humanities (NLP4DH). 9–16. doi:10.18653/v1/2024.nlp4dh-1.2

  62. [62]

    Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The Privacy Policy Landscape After the GDPR.Proceedings on Privacy Enhancing Technologies (PoPETs)2020, 1 (2020), 47–64. doi:10.2478/popets-2020-0004

  63. [63]

    Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. G-Eval: NLG Evaluation Using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634(2023)

  64. [64]

    S. Lloyd. 1982. Least Squares Quantization in PCM.IEEE Transactions on Information Theory28, 2 (1982), 129–137. doi:10.1109/TIT.1982.1056489

  65. [65]

    Matthew Lombard, Jennifer Snyder-Duch, and Cheryl Campanella Bracken

  66. [66]

    Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability.Human Communication Research28, 4 (2002), 587–604

  67. [67]

    Mary R. Lynn. 1986. Determination and Quantification of Content Validity. Nursing Research35, 6 (1986), 382–385. doi:10.1097/00006199-198611000-00017

  68. [68]

    Dominique Machuletz and Rainer Böhme. 2019. Multiple Purposes, Multi- ple Problems: A User Study of Consent Dialogs After GDPR.arXiv preprint arXiv:1908.10048(2019)

  69. [69]

    Naresh K Malhotra, Sung S Kim, and James Agarwal. 2004. Internet Users’ Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal Model.Information Systems Research15, 4 (2004), 336–355. doi:10.1287/isre.1040. 0032

  70. [70]

    Célestin Matte, Nataliia Bielova, and Cristiana Santos. 2020. Do Cookie Banners Respect My Choice? Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework. InProceedings of the 2020 IEEE Sympo- sium on Security and Privacy (S&P). 791–809. doi:10.1109/SP40000.2020.00076

  71. [71]

    Mary L. McHugh. 2012. Interrater Reliability: The Kappa Statistic.Biochemia Medica22, 3 (2012), 276–282

  72. [72]

    Leland McInnes, John Healy, Steve Astels, et al. 2017. hdbscan: Hierarchical density based clustering.J. Open Source Softw.2, 11 (2017), 205

  73. [73]

    Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man- ifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

  74. [74]

    Roger E. Millsap. 2012.Statistical Approaches to Measurement Invariance. Rout- ledge

  75. [75]

    Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 11048–11064. doi:10. 18653/v1/2022.emnlp-main.759

  76. [76]

    Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022)

  77. [77]

    Trung Tin Nguyen, Michael Backes, Ninja Marnau, and Ben Stock. 2021. Share First, Ask Later (or Never?) Studying Violations of GDPR’s Explicit Consent 14 in Android Apps. InProceedings of the USENIX Security Symposium (USENIX Security). 3667–3684

  78. [78]

    Trung Tin Nguyen, Michael Backes, and Ben Stock. 2022. Freely Given Consent? Studying Consent Notice of Third-Party Tracking and Its Violations of GDPR in Android Apps. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 2369–2383. doi:10.1145/3548606.3560564

  79. [79]

    Suzanne Barber

    Razieh Nokhbeh Zaeem, Safa Anya, Alex Issa, Jake Nimergood, Isabelle Rogers, Vinay Shah, Ayush Srivastava, and K. Suzanne Barber. 2020. PrivacyCheck v2: A Tool that Recaps Privacy Policies for You. InProceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 3441–3444. doi:10.1145/3340531.3417469

  80. [80]

    Norberg, Daniel R

    Patricia A. Norberg, Daniel R. Horne, and David A. Horne. 2007. The Privacy Paradox: Personal Information Disclosure Intentions versus Behaviors.Journal of Consumer Affairs41, 1 (2007), 100–126. doi:10.1111/j.1745-6606.2006.00070.x

Showing first 80 references.