Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank

Amirpouya Ghasemaghaei; Corey Pittman; David Mohaisen; Joseph J. LaViola Jr; Mykola Maslych; Trung Cuong Dang; Yahya Hmaiti

arxiv: 2605.24307 · v1 · pith:4BBGU4JFnew · submitted 2026-05-23 · 💻 cs.HC · cs.CR· cs.CY

Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank

Yahya Hmaiti , Mykola Maslych , Amirpouya Ghasemaghaei , Trung Cuong Dang , Corey Pittman , David Mohaisen , Joseph J. LaViola Jr This is my paper

Pith reviewed 2026-06-30 13:09 UTC · model grok-4.3

classification 💻 cs.HC cs.CRcs.CY

keywords privacy preferencesGDPRmeasurement instrumentitem bankexpert validationdata protectionregulatory mechanismsuser survey

0 comments

The pith

A 527-item bank derived from all 99 GDPR articles measures user preferences for specific regulatory privacy protections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a measurement instrument for privacy preferences grounded directly in GDPR text rather than general concerns. Existing tools predate the regulation and do not assess support for its concrete mechanisms such as data erasure or portability rights. The authors extract statements from every article, apply expert reviews and clustering to refine them, and produce an organized bank usable at different levels of detail. A sympathetic reader would care because the bank offers a way to check whether people actually value the protections that compliant systems must implement.

Core claim

By extracting 669 statements from the 99 articles of the GDPR and validating them through two rounds of expert review plus consensus voting by 50 specialists, the work yields a final 527-item bank organized into 9 parent themes and 73 subthemes. The items achieve mean pairwise expert agreement of approximately 85 percent on coverage of the regulation. This bank supplies a complementary dimension for measuring user preferences for regulatory mechanisms instead of abstract privacy concerns.

What carries the argument

The GPPI item bank, formed by extracting statements from GDPR articles and clustering them into expert-validated themes for use at varying granularities.

If this is right

The bank enables assessment of user valuation for concrete GDPR rights including data portability, erasure, and restrictions on automated decision-making.
Measurement can target broad parent themes or narrow subthemes depending on the needed level of detail.
Practitioners gain a tool to evaluate whether implemented privacy policies align with what users prefer under the regulation.
The structure supports repeated use across studies while maintaining direct ties to the full text of the 99 articles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bank could be administered alongside behavioral measures to check whether stated preferences predict actions like filing access requests.
Subsets of the items might help companies prioritize which GDPR features to emphasize in user interfaces based on theme-level scores.
The extraction and clustering process could be repeated for other privacy regulations to produce comparable aligned banks.

Load-bearing premise

Expert agreement that the statements accurately reflect GDPR content is sufficient to establish the items as valid measures of user privacy preferences.

What would settle it

A study that correlates scores on the item bank with users' actual exercise of GDPR rights, such as requesting data erasure or objecting to automated decisions, would test whether the items capture real preferences.

Figures

Figures reproduced from arXiv: 2605.24307 by Amirpouya Ghasemaghaei, Corey Pittman, David Mohaisen, Joseph J. LaViola Jr, Mykola Maslych, Trung Cuong Dang, Yahya Hmaiti.

**Figure 2.** Figure 2: Pipeline for building GPPI. Starting from GDPR text, we created 669 statements (Stage 1). Two expert-review rounds [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Privacy measurement instruments (e.g., CFIP, IUIPC, PAQ) predate GDPR by over a decade and measure privacy concerns, distinct from preferences for regulatory protections (e.g., data portability, erasure, automated decision-making rights). This leaves practitioners without tools to assess whether users value the GDPR mechanisms implemented in compliant policies. We developed a GDPR-grounded privacy preference measurement item bank by extracting 669 statements from all 99 GDPR articles, validated by: (1) two-round expert review achieving full consensus on accuracy, (2) semantic clustering into 10 parent themes and 87 subthemes, and (3) consensus review with 50 privacy experts (5 per theme) using a larger or equal than 4/5 vote retention threshold. The final 527-item bank comprises 9 parent themes and 73 subthemes (18 to 112 items per parent theme, 1 to 29 per subtheme), enabling targeted measurement across granularities while covering GDPR at mean pairwise expert agreement of approx. 85%. This work introduces a complementary measurement dimension aligning user preferences with regulatory mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a GDPR-derived item bank through expert review, but expert fidelity alone does not establish it as a valid user preference measure.

read the letter

The main contribution here is a systematic extraction of 669 statements from the full GDPR text, followed by expert filtering down to 527 items organized into nine themes. This produces a resource that maps directly onto specific regulatory mechanisms like erasure rights or automated decision-making, which older scales such as IUIPC do not target.

The process is straightforward and documented: two rounds of expert accuracy checks, semantic clustering, and a 50-expert consensus round with a 4/5 retention rule. The reported mean pairwise agreement of about 85% shows the items stay close to the regulation. That kind of traceable construction from the actual law is useful for anyone needing GDPR-aligned questions.

The gap is that the paper stops at expert judgment. No user responses, no reliability coefficients, no factor analysis, and no behavioral correlations are described. Expert agreement that an item reflects the GDPR text is necessary but does not show whether users actually hold preferences on those points or whether the items predict choices. Without that link, the claim that the bank enables targeted preference measurement rests on an assumption that remains untested.

This work is aimed at HCI and privacy researchers who want survey items tied to current European rules. It could serve as a starting catalog if later studies add user validation.

I would send it for peer review. The extraction and clustering steps are solid enough to merit referee input on the next validation steps.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a GDPR-aligned Privacy Preference Item Bank (GPPI) by extracting 669 statements from all 99 GDPR articles. These were validated via a two-round expert review achieving full consensus on accuracy to the regulation, followed by semantic clustering into 10 parent themes and 87 subthemes, and a final consensus review by 50 privacy experts (5 per theme) using a ≥4/5 retention threshold. The resulting 527-item bank spans 9 parent themes and 73 subthemes (18–112 items per parent theme), with mean pairwise expert agreement of approximately 85%. The instrument is positioned as enabling targeted measurement of user preferences for specific GDPR mechanisms (e.g., data portability, erasure, automated decision-making), distinct from and complementary to existing concern scales such as IUIPC, CFIP, and PAQ.

Significance. If the central claim holds, the work supplies a systematically derived, regulation-grounded item bank that could support HCI and privacy researchers in assessing user valuation of concrete GDPR rights rather than abstract concerns. The multi-stage expert process and full coverage of the 99 articles represent a strength in systematic construction. However, the significance is constrained by the absence of any user-level validation data, which limits claims about its function as a preference measure.

major comments (2)

[Abstract] Abstract: The claim that the 527-item bank 'enables targeted measurement' of user privacy preferences aligned with GDPR mechanisms rests entirely on expert consensus regarding fidelity to the regulatory text. No user data, factor analysis, reliability coefficients, behavioral correlations, or pilot testing with actual users are reported. Expert agreement on whether statements accurately reflect GDPR content is necessary but not sufficient to establish the items as valid measures of what users prefer or value.
[Validation and Clustering sections] Validation and Clustering sections: The three-step process (two-round expert review, semantic clustering, 50-expert consensus with ≥4/5 retention) is described in detail, yet the manuscript provides no evidence that the retained items correlate with user preferences or behavior. This assumption is load-bearing for the instrument's stated purpose as a preference measurement tool rather than a GDPR paraphrase collection.

minor comments (2)

[Abstract] Abstract: The phrasing 'larger or equal than 4/5 vote retention threshold' is grammatically awkward and should be revised to 'greater than or equal to 4/5'.
[Abstract] Abstract: 'approx. 85%' should be written as 'approximately 85%' for formal consistency.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive review. The manuscript's core contribution is the systematic extraction and expert-validated construction of a GDPR-aligned item bank; we agree that this does not constitute psychometric validation of the items as user preference measures and will revise claims and add explicit scope limitations accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the 527-item bank 'enables targeted measurement' of user privacy preferences aligned with GDPR mechanisms rests entirely on expert consensus regarding fidelity to the regulatory text. No user data, factor analysis, reliability coefficients, behavioral correlations, or pilot testing with actual users are reported. Expert agreement on whether statements accurately reflect GDPR content is necessary but not sufficient to establish the items as valid measures of what users prefer or value.

Authors: We agree that expert consensus establishes regulatory fidelity but is not sufficient to demonstrate that the items validly measure user preferences or values. The paper distinguishes the GPPI from concern scales by offering items tied to specific GDPR mechanisms, but does not report user data. We will revise the abstract to replace 'enables targeted measurement' with language indicating that the bank supplies GDPR-grounded items intended to support such measurement, while noting the need for future user validation. revision: yes
Referee: [Validation and Clustering sections] Validation and Clustering sections: The three-step process (two-round expert review, semantic clustering, 50-expert consensus with ≥4/5 retention) is described in detail, yet the manuscript provides no evidence that the retained items correlate with user preferences or behavior. This assumption is load-bearing for the instrument's stated purpose as a preference measurement tool rather than a GDPR paraphrase collection.

Authors: The described process validates accuracy to the GDPR text and produces a thematically organized bank, but we acknowledge that no evidence of correlation with user preferences or behavior is provided. The manuscript presents the bank as a regulation-derived resource rather than a fully validated psychometric instrument. We will revise the validation and discussion sections to explicitly state that user-level validation remains necessary and is outside the scope of the current work. revision: yes

standing simulated objections not resolved

Absence of user-level validation data (correlations, reliability, behavioral links), which cannot be supplied from the existing manuscript without new empirical studies.

Circularity Check

0 steps flagged

No circularity: derivation is extraction + external expert consensus from public GDPR text

full rationale

The paper's chain consists of (1) direct extraction of 669 statements from the 99 public GDPR articles, (2) two-round expert review for fidelity to the regulation, (3) semantic clustering into themes, and (4) 50-expert consensus retention. None of these steps invoke self-citations as load-bearing premises, fitted parameters renamed as predictions, self-definitional loops, or uniqueness theorems from the authors' prior work. The final 527-item bank is the direct output of this process; its claim to enable 'targeted measurement' rests on the described consensus procedure rather than any reduction to prior fitted values or internal definitions. External expert panels and the public regulatory text serve as independent benchmarks, so the work is self-contained with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that regulatory text extraction plus expert consensus produces a valid preference measurement tool; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Expert consensus via a >=4/5 vote threshold accurately validates extracted statements as representing GDPR content and user-relevant preferences.
The retention rule is applied without reference to empirical user testing or behavioral correlation.

pith-pipeline@v0.9.1-grok · 5765 in / 1182 out tokens · 29283 ms · 2026-06-30T13:09:03.063552+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

132 extracted references · 51 canonical work pages · 8 internal anchors

[1]

California Consumer Privacy Act

2018. California Consumer Privacy Act. https://leginfo.legislature.ca.gov/faces/ billTextClient.xhtml?bill_id=201720180AB375

2018
[2]

Tech- nical Report

2019.Special Eurobarometer 487a: The General Data Protection Regulation. Tech- nical Report. European Commission, Brussels, Belgium. Public Opinion in the European Union; Directorate-General for Communication

2019
[3]

California Privacy Rights Act

2020. California Privacy Rights Act. https://vig.cdn.sos.ca.gov/2020/general/ pdf/topl-prop24.pdf

2020
[4]

Alessandro Acquisti, Laura Brandimarte, and George Loewenstein. 2015. Privacy and Human Behavior in the Age of Information.Science347, 6221 (2015), 509–

2015
[5]

doi:10.1126/science.aaa1465

work page doi:10.1126/science.aaa1465
[6]

1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health

Michael Adler and Erio Ziglio. 1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health. Jessica Kingsley Publishers

1996
[7]

Abdulrahman Alabduljabbar, Ahmed Abusnaina, Ülkü Meteriz-Yildiran, and David Mohaisen. 2021. TLDR: Deep Learning-Based Automated Privacy Policy Annotation with Key Policy Highlights. InProceedings of the Workshop on Privacy in the Electronic Society (WPES). 103–118. doi:10.1145/3463676.3485608

work page doi:10.1145/3463676.3485608 2021
[8]

Lemi Baruh and Mihaela Popescu. 2017. Big Data Analytics and the Limits of Privacy Self-Management.New Media & Society19, 4 (2017), 579–596

2017
[9]

Lemi Baruh, Ekin Secinti, and Zeynep Cemalcilar. 2017. Online privacy concerns and privacy management: A meta-analytical review.Journal of Communication 67, 1 (2017), 26–53

2017
[10]

Becher and Uri Benoliel

Shmuel I. Becher and Uri Benoliel. 2021. Law in Books and Law in Action: The Readability of Privacy Policies and the GDPR. InConsumer Law and Economics. 179–204

2021
[11]

Merkouris, Nicki A

Rimke Bijker, Stephanie S. Merkouris, Nicki A. Dowling, and Simone N. Rodda
[12]

doi:10.2196/59050

ChatGPT for Automated Qualitative Research: Content Analysis.Journal of Medical Internet Research26 (2024), e59050. doi:10.2196/59050

work page doi:10.2196/59050 2024
[13]

Boateng, Torsten B

Godfred O. Boateng, Torsten B. Neilands, Edward A. Frongillo, Hugo R. Melgar- Quiñonez, and Sera L. Young. 2018. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer.Frontiers in Public Health6 (2018), 149

2018
[14]

Alex Bowyer, Jack Holt, Josephine Go Jefferies, Rob Wilson, David Kirk, and Jan David Smeddinck. 2022. Human-GDPR Interaction: Practical Experiences of Accessing Personal Data. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 106. doi:10.1145/3491102.3501947

work page doi:10.1145/3491102.3501947 2022
[15]

2020.The Brussels Effect: How the European Union Rules the World

Anu Bradford. 2020.The Brussels Effect: How the European Union Rules the World. Oxford University Press. doi:10.1093/oso/9780190088583.001.0001

work page doi:10.1093/oso/9780190088583.001.0001 2020
[16]

Virginia Braun and Victoria Clarke. 2006. Using Thematic Analysis in Psychol- ogy.Qualitative Research in Psychology3, 2 (2006), 77–101

2006
[17]

Kaplan, et al

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, et al . 2020. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/ file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

2020
[18]

Joinson, and Ulf-Dietrich Reips

Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips. 2006. Development of Measures of Online Privacy Concern and Protection for Use on the Internet.Journal of the American Society for Information Science and Technology58, 2 (2006), 157–165

2006
[19]

Campbell and Donald W

Donald T. Campbell and Donald W. Fiske. 1959. Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix.Psychological Bulletin56, 2 (1959), 81

1959
[20]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al
[21]

Universal Sentence Encoder.arXiv preprint arXiv:1803.11175(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Anupam Chander, Margot E Kaminski, and William McGeveran. 2020. Catalyz- ing privacy law.Minn. L. Rev.105 (2020), 1733. doi:10.2139/ssrn.3433922

work page doi:10.2139/ssrn.3433922 2020
[23]

Jessica Colnago, Lorrie Faith Cranor, and Alessandro Acquisti. 2023. Is There a Reverse Privacy Paradox? An Exploratory Analysis of Gaps Between Pri- vacy Perspectives and Privacy-Seeking Behaviors. InProceedings on Privacy Enhancing Technologies Symposium (PETS). 455–476

2023
[24]

Jessica Colnago, Lorrie Faith Cranor, Alessandro Acquisti, and Kate Hazel Stan- ton. 2022. Is It a Concern or a Preference? An Investigation into the Ability of Privacy Scales to Capture and Distinguish Granular Privacy Constructs. In Proceedings of the Eighteenth Symposium on Usable Privacy and Security (SOUPS). 331–346

2022
[25]

Crane, Dennis L

Paul K. Crane, Dennis L. Hart, Laura E. Gibbons, and Karon F. Cook. 2006. A 37-Item Shoulder Functional Status Item Pool Had Negligible Differential Item Functioning.Journal of Clinical Epidemiology59, 5 (2006), 478–484. doi:10.1016/ j.jclinepi.2005.10.007

2006
[26]

Hao Cui, Rahmadi Trimananda, Athina Markopoulou, and Scott Jordan. 2023. PoliGraph: Automated Privacy Policy Analysis Using Knowledge Graphs. In Proceedings of the USENIX Security Symposium (USENIX Security). 13

2023
[27]

Matthias Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Florian Schaub, and Thorsten Holz. 2019. We Value Your Privacy. . . Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy. InProceedings of the 2019 Network and Distributed System Security Symposium (NDSS). doi:10.14722/ ndss.2019.23378

work page arXiv 2019
[28]

Norman K. Denzin. 2017.The Research Act: A Theoretical Introduction to Socio- logical Methods. Routledge

2017
[29]

DeVellis and Carolyn T

Robert F. DeVellis and Carolyn T. Thorpe. 2021.Scale Development: Theory and Applications. SAGE Publications

2021
[30]

Right of Access

Mariano Di Martino, Pieter Robyns, Winnie Weyts, Peter Quax, Wim Lamotte, and Ken Andries. 2019. Personal Information Leakage by Abusing the GDPR “Right of Access”. InProceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 371–385. https://www.usenix.org/conference/soups2019/ presentation/dimartino

2019
[31]

Diamond, Robert C

Ivan R. Diamond, Robert C. Grant, Brian M. Feldman, Paul B. Pencharz, Simon C. Ling, Aideen M. Moore, and Paul W. Wales. 2014. Defining Consensus: A Systematic Review Recommends Methodologic Criteria for Reporting of Delphi Studies.Journal of Clinical Epidemiology67, 4 (2014), 401–409

2014
[32]

Tobias Dienlin and Sabine Trepte. 2015. Is the Privacy Paradox a Relic of the Past? An In-Depth Analysis of Privacy Attitudes and Privacy Behaviors. European Journal of Social Psychology45, 3 (2015), 285–297

2015
[33]

Tamara Dinev and Paul Hart. 2006. An Extended Privacy Calculus Model for E-Commerce Transactions.Information Systems Research17, 1 (2006), 61–80. doi:10.1287/isre.1060.0080

work page doi:10.1287/isre.1060.0080 2006
[34]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The FAISS Library.IEEE Transactions on Big Data(2025)

2025
[35]

European Commission, Directorate-General for Justice and Consumers and Kantar. 2019. The General Data Protection Regulation: Special Eurobarometer 487a. doi:10.2838/43726

work page doi:10.2838/43726 2019
[36]

European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data (General Data Protection Regulation).Official Journal of the European UnionL119 (2016), 1–88. https://eur-lex.europa.eu/eli/r...

2016
[37]

Uwe Flick. 2018. Triangulation in Data Collection

2018
[38]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-Augmented Gen- eration for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Hana Habib and Lorrie Faith Cranor. 2022. Evaluating the Usability of Privacy Choice Mechanisms. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). 273–289. https://www.usenix.org/conference/soups2022/ presentation/habib

2022
[40]

Okay, Whatever

Hana Habib, Megan Li, Ellie Young, and Lorrie Cranor. 2022. “Okay, Whatever”: An Evaluation of Cookie Consent Interfaces. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 621. doi:10. 1145/3491102.3501985

work page arXiv 2022
[41]

It’s a Scavenger Hunt

Hana Habib, Sarah Pearman, Jiamin Wang, Yixin Zou, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2020. “It’s a Scavenger Hunt”: Usability of Websites’ Opt-Out and Data Deletion Choices. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI). 1–12. doi:10.1145/3313831.3376511

work page doi:10.1145/3313831.3376511 2020
[42]

Hana Habib, Yixin Zou, Aditi Jannu, Neha Sridhar, Chelse Swoopes, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2019. An Empirical Analysis of Data Deletion and Opt-Out Choices on 150 Websites. In Proceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 387–406. https://www.usenix.org/conference/soups201...

2019
[43]

Shin, and Karl Aberer

Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. 2018. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning. InProceedings of the USENIX Security Symposium (USENIX Security). 531–548

2018
[44]

David Haynes and Lucy Robinson. 2021. A Delphi Study of Risks to Individuals Who Disclose Personal Information Online.Journal of Information Science47, 6 (2021), 792–808. doi:10.1177/0165551521992756

work page doi:10.1177/0165551521992756 2021
[45]

Timothy R. Hinkin. 1995. A Review of Scale Development Practices in the Study of Organizations.Journal of Management21, 5 (1995), 967–988

1995
[46]

Weiyin Hong and James Y. L. Thong. 2013. Internet Privacy Concerns: An Integrated Conceptualization and Four Empirical Studies.MIS Quarterly(2013), 275–298

2013
[47]

Henry Hosseini, Martin Degeling, Christine Utz, and Thomas Hupperich. 2021. Unifying Privacy Policy Detection.Proceedings on Privacy Enhancing Technolo- gies (PoPETs)4 (2021)

2021
[48]

Sandford

Chia-Chien Hsu and Brian A. Sandford. 2007. The Delphi Technique: Making Sense of Consensus.Practical Assessment, Research, and Evaluation12, 10 (2007), 1–8

2007
[49]

Lawrence Hubert and Phipps Arabie. 1985. Comparing Partitions.Journal of Classification2, 1 (1985), 193–218

1985
[50]

Jack Jamieson and Naomi Yamashita. 2023. Escaping the Walled Garden? User Perspectives of Control in Data Portability for Social Media.Proceedings of the ACM on Human-Computer Interaction (CSCW)7, 2, Article 339 (2023). doi:10. 1145/3610188

2023
[51]

Carlos Jensen and Colin Potts. 2004. Privacy Policies as Decision-Making Tools: An Evaluation of Online Privacy Notices. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 471–478. doi:10.1145/985692.985752

work page doi:10.1145/985692.985752 2004
[52]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-Scale Similarity Search with GPUs.arXiv preprint arXiv:1702.08734(2017). http://arxiv.org/abs/ 1702.08734

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder
[54]

Nutrition Label

A “Nutrition Label” for Privacy. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). doi:10.1145/1572532.1572538

work page doi:10.1145/1572532.1572538
[55]

Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F

Patrick G. Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F. Cranor. 2010. Standardizing Privacy Notices: An Online Study of the Nutrition Label Approach. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 1573–1582. doi:10.1145/1753326.1753561

work page doi:10.1145/1753326.1753561 2010
[56]

Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & security64 (2017), 122–134

2017
[57]

2005.Privacy Indexes: A Survey of Westin’s Studies

Ponnurangam Kumaraguru and Lorrie Faith Cranor. 2005.Privacy Indexes: A Survey of Westin’s Studies. Technical Report CMU-ISRI-5-138. Institute for Software Research International, School of Computer Science, Carnegie Mellon University

2005
[58]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics(1977), 159–174

1977
[59]

LangChain AI. 2025. LangChain GitHub Repository. https://github.com/ langchain-ai/langchain

2025
[60]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al . 2020. Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS). 9459–9474. https://proceedings.neurips.cc/paper_ files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

2020
[61]

Lucian Li. 2024. Tracing the Genealogies of Ideas with Sentence Embeddings. InProceedings of the International Conference on Natural Language Processing for Digital Humanities (NLP4DH). 9–16. doi:10.18653/v1/2024.nlp4dh-1.2

work page doi:10.18653/v1/2024.nlp4dh-1.2 2024
[62]

Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The Privacy Policy Landscape After the GDPR.Proceedings on Privacy Enhancing Technologies (PoPETs)2020, 1 (2020), 47–64. doi:10.2478/popets-2020-0004

work page doi:10.2478/popets-2020-0004 2020
[63]

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. G-Eval: NLG Evaluation Using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[64]

S. Lloyd. 1982. Least Squares Quantization in PCM.IEEE Transactions on Information Theory28, 2 (1982), 129–137. doi:10.1109/TIT.1982.1056489

work page doi:10.1109/tit.1982.1056489 1982
[65]

Matthew Lombard, Jennifer Snyder-Duch, and Cheryl Campanella Bracken
[66]

Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability.Human Communication Research28, 4 (2002), 587–604

2002
[67]

Mary R. Lynn. 1986. Determination and Quantification of Content Validity. Nursing Research35, 6 (1986), 382–385. doi:10.1097/00006199-198611000-00017

work page doi:10.1097/00006199-198611000-00017 1986
[68]

Dominique Machuletz and Rainer Böhme. 2019. Multiple Purposes, Multi- ple Problems: A User Study of Consent Dialogs After GDPR.arXiv preprint arXiv:1908.10048(2019)

work page arXiv 2019
[69]

Naresh K Malhotra, Sung S Kim, and James Agarwal. 2004. Internet Users’ Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal Model.Information Systems Research15, 4 (2004), 336–355. doi:10.1287/isre.1040. 0032

work page doi:10.1287/isre.1040 2004
[70]

Célestin Matte, Nataliia Bielova, and Cristiana Santos. 2020. Do Cookie Banners Respect My Choice? Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework. InProceedings of the 2020 IEEE Sympo- sium on Security and Privacy (S&P). 791–809. doi:10.1109/SP40000.2020.00076

work page doi:10.1109/sp40000.2020.00076 2020
[71]

Mary L. McHugh. 2012. Interrater Reliability: The Kappa Statistic.Biochemia Medica22, 3 (2012), 276–282

2012
[72]

Leland McInnes, John Healy, Steve Astels, et al. 2017. hdbscan: Hierarchical density based clustering.J. Open Source Softw.2, 11 (2017), 205

2017
[73]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man- ifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[74]

Roger E. Millsap. 2012.Statistical Approaches to Measurement Invariance. Rout- ledge

2012
[75]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 11048–11064. doi:10. 18653/v1/2022.emnlp-main.759

2022
[76]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[77]

Trung Tin Nguyen, Michael Backes, Ninja Marnau, and Ben Stock. 2021. Share First, Ask Later (or Never?) Studying Violations of GDPR’s Explicit Consent 14 in Android Apps. InProceedings of the USENIX Security Symposium (USENIX Security). 3667–3684

2021
[78]

Trung Tin Nguyen, Michael Backes, and Ben Stock. 2022. Freely Given Consent? Studying Consent Notice of Third-Party Tracking and Its Violations of GDPR in Android Apps. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 2369–2383. doi:10.1145/3548606.3560564

work page doi:10.1145/3548606.3560564 2022
[79]

Suzanne Barber

Razieh Nokhbeh Zaeem, Safa Anya, Alex Issa, Jake Nimergood, Isabelle Rogers, Vinay Shah, Ayush Srivastava, and K. Suzanne Barber. 2020. PrivacyCheck v2: A Tool that Recaps Privacy Policies for You. InProceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 3441–3444. doi:10.1145/3340531.3417469

work page doi:10.1145/3340531.3417469 2020
[80]

Norberg, Daniel R

Patricia A. Norberg, Daniel R. Horne, and David A. Horne. 2007. The Privacy Paradox: Personal Information Disclosure Intentions versus Behaviors.Journal of Consumer Affairs41, 1 (2007), 100–126. doi:10.1111/j.1745-6606.2006.00070.x

work page doi:10.1111/j.1745-6606.2006.00070.x 2007

Showing first 80 references.

[1] [1]

California Consumer Privacy Act

2018. California Consumer Privacy Act. https://leginfo.legislature.ca.gov/faces/ billTextClient.xhtml?bill_id=201720180AB375

2018

[2] [2]

Tech- nical Report

2019.Special Eurobarometer 487a: The General Data Protection Regulation. Tech- nical Report. European Commission, Brussels, Belgium. Public Opinion in the European Union; Directorate-General for Communication

2019

[3] [3]

California Privacy Rights Act

2020. California Privacy Rights Act. https://vig.cdn.sos.ca.gov/2020/general/ pdf/topl-prop24.pdf

2020

[4] [4]

Alessandro Acquisti, Laura Brandimarte, and George Loewenstein. 2015. Privacy and Human Behavior in the Age of Information.Science347, 6221 (2015), 509–

2015

[5] [5]

doi:10.1126/science.aaa1465

work page doi:10.1126/science.aaa1465

[6] [6]

1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health

Michael Adler and Erio Ziglio. 1996.Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health. Jessica Kingsley Publishers

1996

[7] [7]

Abdulrahman Alabduljabbar, Ahmed Abusnaina, Ülkü Meteriz-Yildiran, and David Mohaisen. 2021. TLDR: Deep Learning-Based Automated Privacy Policy Annotation with Key Policy Highlights. InProceedings of the Workshop on Privacy in the Electronic Society (WPES). 103–118. doi:10.1145/3463676.3485608

work page doi:10.1145/3463676.3485608 2021

[8] [8]

Lemi Baruh and Mihaela Popescu. 2017. Big Data Analytics and the Limits of Privacy Self-Management.New Media & Society19, 4 (2017), 579–596

2017

[9] [9]

Lemi Baruh, Ekin Secinti, and Zeynep Cemalcilar. 2017. Online privacy concerns and privacy management: A meta-analytical review.Journal of Communication 67, 1 (2017), 26–53

2017

[10] [10]

Becher and Uri Benoliel

Shmuel I. Becher and Uri Benoliel. 2021. Law in Books and Law in Action: The Readability of Privacy Policies and the GDPR. InConsumer Law and Economics. 179–204

2021

[11] [11]

Merkouris, Nicki A

Rimke Bijker, Stephanie S. Merkouris, Nicki A. Dowling, and Simone N. Rodda

[12] [12]

doi:10.2196/59050

ChatGPT for Automated Qualitative Research: Content Analysis.Journal of Medical Internet Research26 (2024), e59050. doi:10.2196/59050

work page doi:10.2196/59050 2024

[13] [13]

Boateng, Torsten B

Godfred O. Boateng, Torsten B. Neilands, Edward A. Frongillo, Hugo R. Melgar- Quiñonez, and Sera L. Young. 2018. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer.Frontiers in Public Health6 (2018), 149

2018

[14] [14]

Alex Bowyer, Jack Holt, Josephine Go Jefferies, Rob Wilson, David Kirk, and Jan David Smeddinck. 2022. Human-GDPR Interaction: Practical Experiences of Accessing Personal Data. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 106. doi:10.1145/3491102.3501947

work page doi:10.1145/3491102.3501947 2022

[15] [15]

2020.The Brussels Effect: How the European Union Rules the World

Anu Bradford. 2020.The Brussels Effect: How the European Union Rules the World. Oxford University Press. doi:10.1093/oso/9780190088583.001.0001

work page doi:10.1093/oso/9780190088583.001.0001 2020

[16] [16]

Virginia Braun and Victoria Clarke. 2006. Using Thematic Analysis in Psychol- ogy.Qualitative Research in Psychology3, 2 (2006), 77–101

2006

[17] [17]

Kaplan, et al

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, et al . 2020. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/ file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

2020

[18] [18]

Joinson, and Ulf-Dietrich Reips

Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips. 2006. Development of Measures of Online Privacy Concern and Protection for Use on the Internet.Journal of the American Society for Information Science and Technology58, 2 (2006), 157–165

2006

[19] [19]

Campbell and Donald W

Donald T. Campbell and Donald W. Fiske. 1959. Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix.Psychological Bulletin56, 2 (1959), 81

1959

[20] [20]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al

[21] [21]

Universal Sentence Encoder.arXiv preprint arXiv:1803.11175(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Anupam Chander, Margot E Kaminski, and William McGeveran. 2020. Catalyz- ing privacy law.Minn. L. Rev.105 (2020), 1733. doi:10.2139/ssrn.3433922

work page doi:10.2139/ssrn.3433922 2020

[23] [23]

Jessica Colnago, Lorrie Faith Cranor, and Alessandro Acquisti. 2023. Is There a Reverse Privacy Paradox? An Exploratory Analysis of Gaps Between Pri- vacy Perspectives and Privacy-Seeking Behaviors. InProceedings on Privacy Enhancing Technologies Symposium (PETS). 455–476

2023

[24] [24]

Jessica Colnago, Lorrie Faith Cranor, Alessandro Acquisti, and Kate Hazel Stan- ton. 2022. Is It a Concern or a Preference? An Investigation into the Ability of Privacy Scales to Capture and Distinguish Granular Privacy Constructs. In Proceedings of the Eighteenth Symposium on Usable Privacy and Security (SOUPS). 331–346

2022

[25] [25]

Crane, Dennis L

Paul K. Crane, Dennis L. Hart, Laura E. Gibbons, and Karon F. Cook. 2006. A 37-Item Shoulder Functional Status Item Pool Had Negligible Differential Item Functioning.Journal of Clinical Epidemiology59, 5 (2006), 478–484. doi:10.1016/ j.jclinepi.2005.10.007

2006

[26] [26]

Hao Cui, Rahmadi Trimananda, Athina Markopoulou, and Scott Jordan. 2023. PoliGraph: Automated Privacy Policy Analysis Using Knowledge Graphs. In Proceedings of the USENIX Security Symposium (USENIX Security). 13

2023

[27] [27]

Matthias Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Florian Schaub, and Thorsten Holz. 2019. We Value Your Privacy. . . Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy. InProceedings of the 2019 Network and Distributed System Security Symposium (NDSS). doi:10.14722/ ndss.2019.23378

work page arXiv 2019

[28] [28]

Norman K. Denzin. 2017.The Research Act: A Theoretical Introduction to Socio- logical Methods. Routledge

2017

[29] [29]

DeVellis and Carolyn T

Robert F. DeVellis and Carolyn T. Thorpe. 2021.Scale Development: Theory and Applications. SAGE Publications

2021

[30] [30]

Right of Access

Mariano Di Martino, Pieter Robyns, Winnie Weyts, Peter Quax, Wim Lamotte, and Ken Andries. 2019. Personal Information Leakage by Abusing the GDPR “Right of Access”. InProceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 371–385. https://www.usenix.org/conference/soups2019/ presentation/dimartino

2019

[31] [31]

Diamond, Robert C

Ivan R. Diamond, Robert C. Grant, Brian M. Feldman, Paul B. Pencharz, Simon C. Ling, Aideen M. Moore, and Paul W. Wales. 2014. Defining Consensus: A Systematic Review Recommends Methodologic Criteria for Reporting of Delphi Studies.Journal of Clinical Epidemiology67, 4 (2014), 401–409

2014

[32] [32]

Tobias Dienlin and Sabine Trepte. 2015. Is the Privacy Paradox a Relic of the Past? An In-Depth Analysis of Privacy Attitudes and Privacy Behaviors. European Journal of Social Psychology45, 3 (2015), 285–297

2015

[33] [33]

Tamara Dinev and Paul Hart. 2006. An Extended Privacy Calculus Model for E-Commerce Transactions.Information Systems Research17, 1 (2006), 61–80. doi:10.1287/isre.1060.0080

work page doi:10.1287/isre.1060.0080 2006

[34] [34]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The FAISS Library.IEEE Transactions on Big Data(2025)

2025

[35] [35]

European Commission, Directorate-General for Justice and Consumers and Kantar. 2019. The General Data Protection Regulation: Special Eurobarometer 487a. doi:10.2838/43726

work page doi:10.2838/43726 2019

[36] [36]

European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data (General Data Protection Regulation).Official Journal of the European UnionL119 (2016), 1–88. https://eur-lex.europa.eu/eli/r...

2016

[37] [37]

Uwe Flick. 2018. Triangulation in Data Collection

2018

[38] [38]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-Augmented Gen- eration for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Hana Habib and Lorrie Faith Cranor. 2022. Evaluating the Usability of Privacy Choice Mechanisms. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). 273–289. https://www.usenix.org/conference/soups2022/ presentation/habib

2022

[40] [40]

Okay, Whatever

Hana Habib, Megan Li, Ellie Young, and Lorrie Cranor. 2022. “Okay, Whatever”: An Evaluation of Cookie Consent Interfaces. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI). Article 621. doi:10. 1145/3491102.3501985

work page arXiv 2022

[41] [41]

It’s a Scavenger Hunt

Hana Habib, Sarah Pearman, Jiamin Wang, Yixin Zou, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2020. “It’s a Scavenger Hunt”: Usability of Websites’ Opt-Out and Data Deletion Choices. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI). 1–12. doi:10.1145/3313831.3376511

work page doi:10.1145/3313831.3376511 2020

[42] [42]

Hana Habib, Yixin Zou, Aditi Jannu, Neha Sridhar, Chelse Swoopes, Alessandro Acquisti, Lorrie Faith Cranor, Norman Sadeh, and Florian Schaub. 2019. An Empirical Analysis of Data Deletion and Opt-Out Choices on 150 Websites. In Proceedings of the Fifteenth Symposium on Usable Privacy and Security (SOUPS). 387–406. https://www.usenix.org/conference/soups201...

2019

[43] [43]

Shin, and Karl Aberer

Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, and Karl Aberer. 2018. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning. InProceedings of the USENIX Security Symposium (USENIX Security). 531–548

2018

[44] [44]

David Haynes and Lucy Robinson. 2021. A Delphi Study of Risks to Individuals Who Disclose Personal Information Online.Journal of Information Science47, 6 (2021), 792–808. doi:10.1177/0165551521992756

work page doi:10.1177/0165551521992756 2021

[45] [45]

Timothy R. Hinkin. 1995. A Review of Scale Development Practices in the Study of Organizations.Journal of Management21, 5 (1995), 967–988

1995

[46] [46]

Weiyin Hong and James Y. L. Thong. 2013. Internet Privacy Concerns: An Integrated Conceptualization and Four Empirical Studies.MIS Quarterly(2013), 275–298

2013

[47] [47]

Henry Hosseini, Martin Degeling, Christine Utz, and Thomas Hupperich. 2021. Unifying Privacy Policy Detection.Proceedings on Privacy Enhancing Technolo- gies (PoPETs)4 (2021)

2021

[48] [48]

Sandford

Chia-Chien Hsu and Brian A. Sandford. 2007. The Delphi Technique: Making Sense of Consensus.Practical Assessment, Research, and Evaluation12, 10 (2007), 1–8

2007

[49] [49]

Lawrence Hubert and Phipps Arabie. 1985. Comparing Partitions.Journal of Classification2, 1 (1985), 193–218

1985

[50] [50]

Jack Jamieson and Naomi Yamashita. 2023. Escaping the Walled Garden? User Perspectives of Control in Data Portability for Social Media.Proceedings of the ACM on Human-Computer Interaction (CSCW)7, 2, Article 339 (2023). doi:10. 1145/3610188

2023

[51] [51]

Carlos Jensen and Colin Potts. 2004. Privacy Policies as Decision-Making Tools: An Evaluation of Online Privacy Notices. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 471–478. doi:10.1145/985692.985752

work page doi:10.1145/985692.985752 2004

[52] [52]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-Scale Similarity Search with GPUs.arXiv preprint arXiv:1702.08734(2017). http://arxiv.org/abs/ 1702.08734

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder

[54] [54]

Nutrition Label

A “Nutrition Label” for Privacy. InProceedings of the Symposium on Usable Privacy and Security (SOUPS). doi:10.1145/1572532.1572538

work page doi:10.1145/1572532.1572538

[55] [55]

Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F

Patrick G. Kelley, Lucian Cesca, Joanna Bresee, and Lorrie F. Cranor. 2010. Standardizing Privacy Notices: An Online Study of the Nutrition Label Approach. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI). 1573–1582. doi:10.1145/1753326.1753561

work page doi:10.1145/1753326.1753561 2010

[56] [56]

Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & security64 (2017), 122–134

2017

[57] [57]

2005.Privacy Indexes: A Survey of Westin’s Studies

Ponnurangam Kumaraguru and Lorrie Faith Cranor. 2005.Privacy Indexes: A Survey of Westin’s Studies. Technical Report CMU-ISRI-5-138. Institute for Software Research International, School of Computer Science, Carnegie Mellon University

2005

[58] [58]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data.Biometrics(1977), 159–174

1977

[59] [59]

LangChain AI. 2025. LangChain GitHub Repository. https://github.com/ langchain-ai/langchain

2025

[60] [60]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, et al . 2020. Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS). 9459–9474. https://proceedings.neurips.cc/paper_ files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

2020

[61] [61]

Lucian Li. 2024. Tracing the Genealogies of Ideas with Sentence Embeddings. InProceedings of the International Conference on Natural Language Processing for Digital Humanities (NLP4DH). 9–16. doi:10.18653/v1/2024.nlp4dh-1.2

work page doi:10.18653/v1/2024.nlp4dh-1.2 2024

[62] [62]

Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The Privacy Policy Landscape After the GDPR.Proceedings on Privacy Enhancing Technologies (PoPETs)2020, 1 (2020), 47–64. doi:10.2478/popets-2020-0004

work page doi:10.2478/popets-2020-0004 2020

[63] [63]

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. G-Eval: NLG Evaluation Using GPT-4 with Better Human Alignment. arXiv preprint arXiv:2303.16634(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[64] [64]

S. Lloyd. 1982. Least Squares Quantization in PCM.IEEE Transactions on Information Theory28, 2 (1982), 129–137. doi:10.1109/TIT.1982.1056489

work page doi:10.1109/tit.1982.1056489 1982

[65] [65]

Matthew Lombard, Jennifer Snyder-Duch, and Cheryl Campanella Bracken

[66] [66]

Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability.Human Communication Research28, 4 (2002), 587–604

2002

[67] [67]

Mary R. Lynn. 1986. Determination and Quantification of Content Validity. Nursing Research35, 6 (1986), 382–385. doi:10.1097/00006199-198611000-00017

work page doi:10.1097/00006199-198611000-00017 1986

[68] [68]

Dominique Machuletz and Rainer Böhme. 2019. Multiple Purposes, Multi- ple Problems: A User Study of Consent Dialogs After GDPR.arXiv preprint arXiv:1908.10048(2019)

work page arXiv 2019

[69] [69]

Naresh K Malhotra, Sung S Kim, and James Agarwal. 2004. Internet Users’ Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal Model.Information Systems Research15, 4 (2004), 336–355. doi:10.1287/isre.1040. 0032

work page doi:10.1287/isre.1040 2004

[70] [70]

Célestin Matte, Nataliia Bielova, and Cristiana Santos. 2020. Do Cookie Banners Respect My Choice? Measuring Legal Compliance of Banners from IAB Europe’s Transparency and Consent Framework. InProceedings of the 2020 IEEE Sympo- sium on Security and Privacy (S&P). 791–809. doi:10.1109/SP40000.2020.00076

work page doi:10.1109/sp40000.2020.00076 2020

[71] [71]

Mary L. McHugh. 2012. Interrater Reliability: The Kappa Statistic.Biochemia Medica22, 3 (2012), 276–282

2012

[72] [72]

Leland McInnes, John Healy, Steve Astels, et al. 2017. hdbscan: Hierarchical density based clustering.J. Open Source Softw.2, 11 (2017), 205

2017

[73] [73]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man- ifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[74] [74]

Roger E. Millsap. 2012.Statistical Approaches to Measurement Invariance. Rout- ledge

2012

[75] [75]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 11048–11064. doi:10. 18653/v1/2022.emnlp-main.759

2022

[76] [76]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[77] [77]

Trung Tin Nguyen, Michael Backes, Ninja Marnau, and Ben Stock. 2021. Share First, Ask Later (or Never?) Studying Violations of GDPR’s Explicit Consent 14 in Android Apps. InProceedings of the USENIX Security Symposium (USENIX Security). 3667–3684

2021

[78] [78]

Trung Tin Nguyen, Michael Backes, and Ben Stock. 2022. Freely Given Consent? Studying Consent Notice of Third-Party Tracking and Its Violations of GDPR in Android Apps. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 2369–2383. doi:10.1145/3548606.3560564

work page doi:10.1145/3548606.3560564 2022

[79] [79]

Suzanne Barber

Razieh Nokhbeh Zaeem, Safa Anya, Alex Issa, Jake Nimergood, Isabelle Rogers, Vinay Shah, Ayush Srivastava, and K. Suzanne Barber. 2020. PrivacyCheck v2: A Tool that Recaps Privacy Policies for You. InProceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 3441–3444. doi:10.1145/3340531.3417469

work page doi:10.1145/3340531.3417469 2020

[80] [80]

Norberg, Daniel R

Patricia A. Norberg, Daniel R. Horne, and David A. Horne. 2007. The Privacy Paradox: Personal Information Disclosure Intentions versus Behaviors.Journal of Consumer Affairs41, 1 (2007), 100–126. doi:10.1111/j.1745-6606.2006.00070.x

work page doi:10.1111/j.1745-6606.2006.00070.x 2007