A Computational Method for Measuring "Open Codes" in Qualitative Analysis

Alexandros Lotsos; Bruce Sherin; Caiyi Wang; Jessica Hullman; John Chen; Lexie Zhao; Michael Horn; Sihan Cheng; Uri Wilensky; Yanjia Zhang

arxiv: 2411.12142 · v4 · submitted 2024-11-19 · 💻 cs.CL · cs.AI· cs.HC· cs.LG

A Computational Method for Measuring "Open Codes" in Qualitative Analysis

John Chen , Alexandros Lotsos , Sihan Cheng , Caiyi Wang , Lexie Zhao , Yanjia Zhang , Jessica Hullman , Bruce Sherin

show 2 more authors

Uri Wilensky Michael Horn

This is my paper

Pith reviewed 2026-05-23 17:58 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.HCcs.LG

keywords inductive codingqualitative analysiscodebook mergingLLM-assisted analysiscoverage metricnovelty metrichuman-AI collaboration

0 comments

The pith

Four metrics quantify how much each coder contributes to a merged codebook in inductive qualitative analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to evaluate inductive coding, the process of deriving codes directly from data without predefined categories. It first combines separate codebooks from human or AI coders through an LLM-enriched merging step, then scores each coder's input against the merged result using four metrics: Coverage, Overlap, Novelty, and Divergence. This avoids the need for ground-truth labels that would contradict the exploratory goal of inductive work. Experiments on a human-coded conversation dataset show the metrics can detect problems such as excessive codes or codes unrelated to the source material while remaining stable across repeated runs and different language models.

Core claim

The central claim is that an LLM-enriched algorithm can merge individual codebooks and that the resulting merged codebook serves as a fair reference against which each coder's contribution is measured by Coverage (how much of the merged set the coder covers), Overlap (shared codes), Novelty (unique codes added), and Divergence (codes that differ in interpretation).

What carries the argument

The LLM-enriched merging algorithm, which combines codes from multiple coders while attempting to retain their exploratory inputs, paired with the four metrics that compare each original codebook to the merged version.

If this is right

Different merging algorithms produce different metric values, so the choice of merger must be reported.
The four metrics stay consistent when the pipeline is repeated or when a new LLM is substituted.
The metrics flag concrete coding problems such as coders generating too many codes or introducing codes not supported by the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams could use the novelty and divergence scores to decide whether to retain or revise AI-generated codes before final analysis.
The same measurement approach could be tested on coding tasks outside conversation data, such as interview transcripts or field notes.
If the metrics prove stable, they might serve as an automated check before researchers finalize a shared codebook.

Load-bearing premise

The merged codebook produced by the LLM algorithm fairly represents all exploratory contributions without the model adding its own systematic bias.

What would settle it

Re-running the full pipeline on the same dataset with a different large language model and obtaining substantially different metric scores for the human coders would show the results depend on the specific model chosen.

Figures

Figures reproduced from arXiv: 2411.12142 by Alexandros Lotsos, Bruce Sherin, Caiyi Wang, Jessica Hullman, John Chen, Lexie Zhao, Michael Horn, Sihan Cheng, Uri Wilensky, Yanjia Zhang.

**Figure 1.** Figure 1: A: A conceptual illustration of an ACS merged from csp1 and csp2. B: Measuring csp1 using the merged ACS as a reference. The first step towards calculating our proposed metrics is to aggregate the codebooks produced by multiple individual coders into a single conceptual space that serves as an approximation of “all possible interpretations” of the data, as prescribed by qualitative analysis methods ( [PI… view at source ↗

**Figure 2.** Figure 2: Effect of merging LLM on four evaluation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Mean coder metrics across Baseline, Flood [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Qualitative analysis is critical to understanding human datasets in many social science disciplines. A central method in this process is inductive coding, where researchers identify and interpret codes directly from the datasets themselves. Yet, this exploratory approach poses challenges for meeting methodological expectations (such as ``depth'' and ``variation''), especially as researchers increasingly adopt Generative AI (GAI) for support. Ground-truth-based metrics are insufficient because they contradict the exploratory nature of inductive coding, while manual evaluation can be labor-intensive. This paper presents a theory-informed computational method for measuring inductive coding results from humans and GAI. Our method first merges individual codebooks using an LLM-enriched algorithm. It measures each coder's contribution against the merged result using four novel metrics: Coverage, Overlap, Novelty, and Divergence. Through two experiments on a human-coded online conversation dataset, we 1) reveal the merging algorithm's impact on metrics; 2) validate the metrics' stability and robustness across multiple runs and different LLMs; and 3) showcase the metrics' ability to diagnose coding issues, such as excessive or irrelevant (hallucinated) codes. Our work provides a reliable pathway for ensuring methodological rigor in human-AI qualitative analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives four metrics for scoring inductive coding contributions via an LLM merge, which fills a practical gap but risks embedding model bias into the reference codebook.

read the letter

The core of this paper is a computational approach to evaluating inductive coding in qualitative analysis when both humans and generative AI are involved. They merge individual codebooks with an LLM-enriched algorithm and then use four new metrics—Coverage, Overlap, Novelty, and Divergence—to assess how much each coder adds to the combined result. What the paper does well is fill a gap in a practical way. Inductive coding is exploratory by nature, so standard accuracy metrics don't apply, and manual review is slow. The method gives a way to quantify contributions and spot issues like too many codes or irrelevant ones. The two experiments on an online conversation dataset back up that the metrics are stable across runs and different LLMs, and they can diagnose coding problems. The main concern is the one in the stress test. All metrics depend on that single merged codebook from the LLM procedure. If the LLM has preferences for certain ways of naming or grouping codes, then coders who happen to match those will look better on Coverage and worse on Divergence, even if their work is equally exploratory. Testing stability across LLMs helps, but it doesn't prove the reference is unbiased or that it aligns with human judgment. The paper would be stronger with some check against a human-merged version or inter-rater agreement on the merge step itself. This work is aimed at qualitative researchers in social sciences and education who are incorporating AI into their analysis. Someone looking for tools to audit coding processes in human-AI teams would find it relevant. It is worth sending for peer review because it tackles a methodological issue that is becoming more common, and the ideas are grounded enough to benefit from referee feedback on the bias and validation aspects.

Referee Report

2 major / 1 minor

Summary. The paper proposes a computational method for evaluating inductive ('open') coding in qualitative analysis. It merges individual codebooks via an LLM-enriched algorithm and then scores each coder's contribution using four novel metrics (Coverage, Overlap, Novelty, Divergence) computed against the merged reference. Two experiments on a human-coded online conversation dataset are reported to demonstrate the merging algorithm's effects, the metrics' stability across runs and LLMs, and their diagnostic utility for issues such as excessive or hallucinated codes.

Significance. If the metrics can be shown to be stable and free of systematic LLM-induced bias, the approach would supply a scalable, ground-truth-free way to assess exploratory coding quality in human-AI workflows, addressing a recognized methodological gap in social-science qualitative research.

major comments (2)

[Abstract / merging algorithm] Abstract and method description of the merging algorithm: all four metrics are defined relative to a single LLM-enriched merged codebook. This construction risks circular dependence; if the merge systematically favors certain phrasings, abstraction levels, or granularities (as LLMs are known to do), then Coverage and Divergence scores will partly reflect alignment with the model's priors rather than intrinsic exploratory quality. The reported stability across multiple LLMs does not demonstrate neutrality of the reference, and no external human-merged reference or inter-rater comparison against the LLM merge is described.
[Abstract / validation experiments] Validation experiments (Abstract): the claims of stability, robustness, and diagnostic power rest on two experiments whose data details, exact metric equations, exclusion rules, sample sizes, and error analysis are not supplied in the provided description. Without these, it is impossible to evaluate whether the reported stability is load-bearing evidence or an artifact of the chosen dataset and LLM family.

minor comments (1)

[Abstract] The acronym GAI is introduced without prior expansion; a single sentence defining it on first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying aspects of the method and experiments while committing to revisions that strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / merging algorithm] Abstract and method description of the merging algorithm: all four metrics are defined relative to a single LLM-enriched merged codebook. This construction risks circular dependence; if the merge systematically favors certain phrasings, abstraction levels, or granularities (as LLMs are known to do), then Coverage and Divergence scores will partly reflect alignment with the model's priors rather than intrinsic exploratory quality. The reported stability across multiple LLMs does not demonstrate neutrality of the reference, and no external human-merged reference or inter-rater comparison against the LLM merge is described.

Authors: We acknowledge the concern about potential circularity and LLM priors influencing the reference. The LLM-enriched merge is intended to synthesize a collective codebook from individual inductive contributions without relying on external ground truth, consistent with the exploratory goals of open coding. Stability across LLMs and runs is presented as evidence of robustness rather than full neutrality. We agree that direct comparison to a human-merged reference would provide stronger validation of the merge step and will add this analysis (including inter-rater agreement metrics between LLM and human merges) to the revised manuscript. revision: yes
Referee: [Abstract / validation experiments] Validation experiments (Abstract): the claims of stability, robustness, and diagnostic power rest on two experiments whose data details, exact metric equations, exclusion rules, sample sizes, and error analysis are not supplied in the provided description. Without these, it is impossible to evaluate whether the reported stability is load-bearing evidence or an artifact of the chosen dataset and LLM family.

Authors: The abstract summarizes the two experiments at a high level. The full manuscript provides the requested details: the online conversation dataset and human-coded codebooks (Section 4), exact equations for Coverage, Overlap, Novelty, and Divergence (Section 3), sample sizes, exclusion rules for code filtering, and error analysis of metric behavior. We will expand the abstract to include brief references to these elements and ensure all equations are explicitly stated for clarity. revision: partial

Circularity Check

0 steps flagged

No circularity: metrics defined relative to merge by design, no derivations or self-referential reductions

full rationale

The paper describes an LLM-enriched merging algorithm followed by four metrics (Coverage, Overlap, Novelty, Divergence) computed against the merged codebook. This structure is definitional to the proposed method rather than a derivation that reduces to fitted inputs or self-citations. No equations, uniqueness theorems, or load-bearing self-citations appear in the abstract or described content. Validation experiments test stability across LLMs and runs, which is an independent empirical check. The method is self-contained against external benchmarks with no reduction of predictions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method implicitly assumes LLM merging is neutral and metrics capture exploratory quality without further justification.

pith-pipeline@v0.9.0 · 5780 in / 1208 out tokens · 26400 ms · 2026-05-23T17:58:03.724760+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Anne Adams, Peter Lunt, and Paul Cairns. 2008. A Qualitative Approach to HCI Research . In Research Methods for Human - Computer Interaction . Cambridge University Press

work page 2008
[4]

Jennifer Attride-Stirling. 2001. https://doi.org/10.1177/146879410100100307 Thematic networks: an analytic tool for qualitative research . Qualitative Research, 1(3):385--405

work page doi:10.1177/146879410100100307 2001
[5]

Robert Bowman, Camille Nadal, Kellie Morrissey, Anja Thieme, and Gavin Doherty. 2023. https://doi.org/10.1145/3544548.3581203 Using Thematic Analysis in Healthcare HCI at CHI : A Scoping Review . In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , pages 1--18, Hamburg Germany. ACM

work page doi:10.1145/3544548.3581203 2023
[6]

Virginia Braun and Victoria Clarke. 2006. https://doi.org/10.1191/1478088706qp063oa Using thematic analysis in psychology . Qualitative Research in Psychology, 3(2):77--101

work page doi:10.1191/1478088706qp063oa 2006
[7]

Virginia Braun and Victoria Clarke. 2012. https://psycnet.apa.org/record/2011-23864-004 Thematic analysis. American Psychological Association

work page 2012
[8]

Virginia Braun and Victoria Clarke. 2013. Successful qualitative research: A practical guide for beginners

work page 2013
[9]

Virginia Braun and Victoria Clarke. 2021. https://doi.org/10.1080/14780887.2020.1769238 One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qualitative Research in Psychology, 18(3):328--352

work page doi:10.1080/14780887.2020.1769238 2021
[10]

Bringer, Lynne H

Joy D. Bringer, Lynne H. Johnston, and Celia H. Brackenridge. 2004. https://doi.org/10.1177/1468794104044434 Maximizing Transparency in a Doctoral Thesis1 : The Complexities of Writing About the Use of QSR * NVIVO Within a Grounded Theory Study . Qualitative Research, 4(2):247--265

work page doi:10.1177/1468794104044434 2004
[11]

Ariel Cascio, Eunlye Lee, Nicole Vaudrin, and Darcy A

M. Ariel Cascio, Eunlye Lee, Nicole Vaudrin, and Darcy A. Freedman. 2019. https://doi.org/10.1177/1525822X19838237 A Team -based Approach to Open Coding : Considerations for Creating Intercoder Consensus . Field Methods, 31(2):116--130

work page doi:10.1177/1525822x19838237 2019
[12]

John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, and Michael Horn. 2025. Processes matter: How ml/gai approaches could support open qualitative coding of online discourse datasets. In Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025, pp. 415-419. International Society ...

work page 2025
[14]

Juliet Corbin and Anselm Strauss. 2008 b . https://doi.org/10.4135/9781452230153 Chapter 14 / Criteria for Evaluation . In Basics of Qualitative Research (3rd ed.): Techniques and Procedures for Developing Grounded Theory . SAGE Publications, Inc., 2455 Teller Road, Thousand Oaks California 91320 United States

work page doi:10.4135/9781452230153 2008
[15]

Corbin and Anselm Strauss

Juliet M. Corbin and Anselm Strauss. 1990. https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/article/10.1007/bf00988593&casa_token=aBHJMqIs5a4AAAAA:ngulSWPiXoluZjWKFBIiPpeFVSSBQtx7ncsSpleI54sgSYiDmpFNzNPe96fXDyeVUwU1YO-miYiL3q_d Grounded theory research: Procedures , canons, and evaluative criteria . Qualitative sociology, 13(...

work page doi:10.1007/bf00988593 1990
[16]

Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.669 LLM -in-the-loop: Leveraging large language model for thematic analysis . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9993--10001, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.669 2023
[17]

Stefano De Paoli. 2023 a . https://doi.org/10.48550/arXiv.2305.13014 Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model . arXiv preprint. ArXiv:2305.13014 [cs]

work page doi:10.48550/arxiv.2305.13014 2023
[18]

Stefano De Paoli. 2023 b . https://doi.org/10.1177/08944393231220483 Performing an Inductive Thematic Analysis of Semi - Structured Interviews With a Large Language Model : An Exploration and Provocation on the Limits of the Approach . Social Science Computer Review, 0(0):1--23

work page doi:10.1177/08944393231220483 2023
[19]

Dominic Furniss, Ann Blandford, and Paul Curzon. 2011. https://doi.org/10.1145/1978942.1978960 Confessions from a grounded theory PhD : experiences and lessons learnt . In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages 113--122, Vancouver BC Canada. ACM

work page doi:10.1145/1978942.1978960 2011
[20]

Glassman, and Toby Jia-Jun Li

Simret Araya Gebreegziabher, Zheng Zhang, Xiaohang Tang, Yihao Meng, Elena L. Glassman, and Toby Jia-Jun Li. 2023. https://doi.org/10.1145/3544548.3581352 PaTAT : Human - AI Collaborative Qualitative Coding with Explainable Interactive Rule Synthesis . In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , CHI '23, pages 1--19, ...

work page doi:10.1145/3544548.3581352 2023
[21]

Maarten Grootendorst. 2022. Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

Sean Lee, Aamir Shakir, Darius Koenig, and Julius Lipp. 2024. https://www.mixedbread.ai/blog/mxbai-embed-large-v1 Open source strikes bread - new fluffy embeddings model

work page 2024
[23]

Xianming Li and Jing Li. 2023. Angle-optimized text embeddings. arXiv preprint arXiv:2309.12871

work page arXiv 2023
[24]

Jasy Suet Yan Liew, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. https://doi.org/10.3115/v1/W14-2513 Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis . In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science , pages 44--48, Baltimore, MD, USA. Association for Comp...

work page doi:10.3115/v1/w14-2513 2014
[25]

Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. https://doi.org/10.1145/3359174 Reliability and Inter -rater Reliability in Qualitative Research : Norms and Guidelines for CSCW and HCI Practice . Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1--23

work page doi:10.1145/3359174 2019
[26]

Angelina Parfenova, Andreas Marfurt, J \"u rgen Pfeffer, and Alexander Denzler. 2025. Text annotation via inductive coding: Comparing human experts to llms in qualitative data analysis. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6456--6469

work page 2025
[27]

Hamed Rahimi, Jacob Louis Hoover, David Mimno, Hubert Naacke, Camelia Constantin, and Bernd Amann. 2023. Contextualized topic coherence metrics. arXiv preprint arXiv:2305.14587

work page arXiv 2023
[28]

Testing and Assessment

Md Shidur Rahman. 2016. https://doi.org/10.5539/jel.v6n1p102 The Advantages and Disadvantages of Using Qualitative and Quantitative Approaches and Methods in Language “ Testing and Assessment ” Research : A Literature Review . Journal of Education and Learning, 6(1):102

work page doi:10.5539/jel.v6n1p102 2016
[29]

Tim Rietz and Alexander Maedche. 2021. https://doi.org/10.1145/3411764.3445591 Cody: An AI - Based System to Semi - Automate Coding for Qualitative Research . In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , CHI '21, pages 1--14, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3411764.3445591 2021
[30]

Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James Folkestad, and Marcia Moraes. 2023. Automated code extraction from discussion. In Advances in Quantitative Ethnography: 4th International Conference, ICQE 2022, Copenhagen, Denmark, October 15--19, 2022, Proceedings, page 227. Springer Nature

work page 2023
[31]

Benjamin Saunders, Julius Sim, Tom Kingstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. https://doi.org/10.1007/s11135-017-0574-8 Saturation in qualitative research: exploring its conceptualization and operationalization . Quality & Quantity, 52(4):1893--1907

work page doi:10.1007/s11135-017-0574-8 2018
[32]

Carson Sievert and Kenneth Shirley. 2014. Ldavis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces, pages 63--70

work page 2014
[33]

Ravi Sinha, Idris Solola, Ha Nguyen, Hillary Swanson, and LuEttaMae Lawrence. 2024. https://doi.org/10.1145/3663433.3663456 The Role of Generative AI in Qualitative Research : GPT -4's Contributions to a Grounded Theory Analysis . In Proceedings of the Symposium on Learning , Design and Technology , pages 17--25, Delft Netherlands. ACM

work page doi:10.1145/3663433.3663456 2024
[34]

Cesare Spinoso-Di Piano. 2023. Qualitative code suggestion: A human-centric approach to qualitative coding. McGill University (Canada)

work page 2023
[35]

Anselm Strauss and Juliet Corbin. 1998. Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd ed. Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd ed., pages xiii, 312--xiii, 312. Place: Thousand Oaks, CA, US Publisher: Sage Publications, Inc

work page 1998
[36]

Gemma Team. 2025 a . https://goo.gle/Gemma3Report Gemma 3

work page 2025
[37]

Qwen Team. 2025 b . https://qwenlm.github.io/blog/qwq-32b/ Qwq-32b: Embracing the power of reinforcement learning

work page 2025
[38]

Gareth Terry, Nikki Hayfield, Victoria Clarke, and Virginia Braun. 2017. https://books.google.com/books?hl=en&lr=&id=AAniDgAAQBAJ&oi=fnd&pg=PA17&dq=Thematic+analysis+terry+&ots=dpi2nmHiMV&sig=959tII4BUp9su6Hv2JJui1KjP5Q Thematic analysis . The SAGE handbook of qualitative research in psychology, 2(17-37):25. Publisher: SAGE Publications Ltd

work page 2017
[39]

David R. Thomas. 2006. https://doi.org/10.1177/1098214005283748 A General Inductive Approach for Analyzing Qualitative Evaluation Data . American Journal of Evaluation, 27(2):237--246

work page doi:10.1177/1098214005283748 2006
[40]

Anthony G. Tuckett. 2005. https://doi.org/10.5172/conu.19.1-2.75 Applying thematic analysis theory to practice: A researcher’s experience . Contemporary Nurse, 19(1-2):75--87

work page doi:10.5172/conu.19.1-2.75 2005
[41]

Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer

Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. https://doi.org/10.1145/3581754.3584136 Supporting Qualitative Analysis with Large Language Models : Combining Codebook with GPT -3 for Deductive Coding . In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces , IUI '23 Companion ,...

work page doi:10.1145/3581754.3584136 2023
[42]

Baker, Juhan Kim, and Nidhi Nasiar

Andres Felipe Zambrano, Xiner Liu, Amanda Barany, Ryan S. Baker, Juhan Kim, and Nidhi Nasiar. 2023. https://doi.org/10.1007/978-3-031-47014-1_32 From nCoder to ChatGPT : From Automated Coding to Refining Human Coding . In Advances in Quantitative Ethnography , Communications in Computer and Information Science , pages 470--485, Cham. Springer Nature Switzerland

work page doi:10.1007/978-3-031-47014-1_32 2023
[43]

Fengxiang Zhao, Fan Yu, and Yi Shang. 2024. A new method supporting qualitative data analysis through prompt generation for inductive coding. In 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), pages 164--169. IEEE

work page 2024

[1] [1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Anne Adams, Peter Lunt, and Paul Cairns. 2008. A Qualitative Approach to HCI Research . In Research Methods for Human - Computer Interaction . Cambridge University Press

work page 2008

[4] [4]

Jennifer Attride-Stirling. 2001. https://doi.org/10.1177/146879410100100307 Thematic networks: an analytic tool for qualitative research . Qualitative Research, 1(3):385--405

work page doi:10.1177/146879410100100307 2001

[5] [5]

Robert Bowman, Camille Nadal, Kellie Morrissey, Anja Thieme, and Gavin Doherty. 2023. https://doi.org/10.1145/3544548.3581203 Using Thematic Analysis in Healthcare HCI at CHI : A Scoping Review . In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , pages 1--18, Hamburg Germany. ACM

work page doi:10.1145/3544548.3581203 2023

[6] [6]

Virginia Braun and Victoria Clarke. 2006. https://doi.org/10.1191/1478088706qp063oa Using thematic analysis in psychology . Qualitative Research in Psychology, 3(2):77--101

work page doi:10.1191/1478088706qp063oa 2006

[7] [7]

Virginia Braun and Victoria Clarke. 2012. https://psycnet.apa.org/record/2011-23864-004 Thematic analysis. American Psychological Association

work page 2012

[8] [8]

Virginia Braun and Victoria Clarke. 2013. Successful qualitative research: A practical guide for beginners

work page 2013

[9] [9]

Virginia Braun and Victoria Clarke. 2021. https://doi.org/10.1080/14780887.2020.1769238 One size fits all? What counts as quality practice in (reflexive) thematic analysis? Qualitative Research in Psychology, 18(3):328--352

work page doi:10.1080/14780887.2020.1769238 2021

[10] [10]

Bringer, Lynne H

Joy D. Bringer, Lynne H. Johnston, and Celia H. Brackenridge. 2004. https://doi.org/10.1177/1468794104044434 Maximizing Transparency in a Doctoral Thesis1 : The Complexities of Writing About the Use of QSR * NVIVO Within a Grounded Theory Study . Qualitative Research, 4(2):247--265

work page doi:10.1177/1468794104044434 2004

[11] [11]

Ariel Cascio, Eunlye Lee, Nicole Vaudrin, and Darcy A

M. Ariel Cascio, Eunlye Lee, Nicole Vaudrin, and Darcy A. Freedman. 2019. https://doi.org/10.1177/1525822X19838237 A Team -based Approach to Open Coding : Considerations for Creating Intercoder Consensus . Field Methods, 31(2):116--130

work page doi:10.1177/1525822x19838237 2019

[12] [12]

John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, and Michael Horn. 2025. Processes matter: How ml/gai approaches could support open qualitative coding of online discourse datasets. In Proceedings of the 18th International Conference on Computer-Supported Collaborative Learning-CSCL 2025, pp. 415-419. International Society ...

work page 2025

[13] [14]

Juliet Corbin and Anselm Strauss. 2008 b . https://doi.org/10.4135/9781452230153 Chapter 14 / Criteria for Evaluation . In Basics of Qualitative Research (3rd ed.): Techniques and Procedures for Developing Grounded Theory . SAGE Publications, Inc., 2455 Teller Road, Thousand Oaks California 91320 United States

work page doi:10.4135/9781452230153 2008

[14] [15]

Corbin and Anselm Strauss

Juliet M. Corbin and Anselm Strauss. 1990. https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/article/10.1007/bf00988593&casa_token=aBHJMqIs5a4AAAAA:ngulSWPiXoluZjWKFBIiPpeFVSSBQtx7ncsSpleI54sgSYiDmpFNzNPe96fXDyeVUwU1YO-miYiL3q_d Grounded theory research: Procedures , canons, and evaluative criteria . Qualitative sociology, 13(...

work page doi:10.1007/bf00988593 1990

[15] [16]

Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.669 LLM -in-the-loop: Leveraging large language model for thematic analysis . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9993--10001, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.669 2023

[16] [17]

Stefano De Paoli. 2023 a . https://doi.org/10.48550/arXiv.2305.13014 Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model . arXiv preprint. ArXiv:2305.13014 [cs]

work page doi:10.48550/arxiv.2305.13014 2023

[17] [18]

Stefano De Paoli. 2023 b . https://doi.org/10.1177/08944393231220483 Performing an Inductive Thematic Analysis of Semi - Structured Interviews With a Large Language Model : An Exploration and Provocation on the Limits of the Approach . Social Science Computer Review, 0(0):1--23

work page doi:10.1177/08944393231220483 2023

[18] [19]

Dominic Furniss, Ann Blandford, and Paul Curzon. 2011. https://doi.org/10.1145/1978942.1978960 Confessions from a grounded theory PhD : experiences and lessons learnt . In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages 113--122, Vancouver BC Canada. ACM

work page doi:10.1145/1978942.1978960 2011

[19] [20]

Glassman, and Toby Jia-Jun Li

Simret Araya Gebreegziabher, Zheng Zhang, Xiaohang Tang, Yihao Meng, Elena L. Glassman, and Toby Jia-Jun Li. 2023. https://doi.org/10.1145/3544548.3581352 PaTAT : Human - AI Collaborative Qualitative Coding with Explainable Interactive Rule Synthesis . In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , CHI '23, pages 1--19, ...

work page doi:10.1145/3544548.3581352 2023

[20] [21]

Maarten Grootendorst. 2022. Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [22]

Sean Lee, Aamir Shakir, Darius Koenig, and Julius Lipp. 2024. https://www.mixedbread.ai/blog/mxbai-embed-large-v1 Open source strikes bread - new fluffy embeddings model

work page 2024

[22] [23]

Xianming Li and Jing Li. 2023. Angle-optimized text embeddings. arXiv preprint arXiv:2309.12871

work page arXiv 2023

[23] [24]

Jasy Suet Yan Liew, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. https://doi.org/10.3115/v1/W14-2513 Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis . In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science , pages 44--48, Baltimore, MD, USA. Association for Comp...

work page doi:10.3115/v1/w14-2513 2014

[24] [25]

Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. https://doi.org/10.1145/3359174 Reliability and Inter -rater Reliability in Qualitative Research : Norms and Guidelines for CSCW and HCI Practice . Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1--23

work page doi:10.1145/3359174 2019

[25] [26]

Angelina Parfenova, Andreas Marfurt, J \"u rgen Pfeffer, and Alexander Denzler. 2025. Text annotation via inductive coding: Comparing human experts to llms in qualitative data analysis. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6456--6469

work page 2025

[26] [27]

Hamed Rahimi, Jacob Louis Hoover, David Mimno, Hubert Naacke, Camelia Constantin, and Bernd Amann. 2023. Contextualized topic coherence metrics. arXiv preprint arXiv:2305.14587

work page arXiv 2023

[27] [28]

Testing and Assessment

Md Shidur Rahman. 2016. https://doi.org/10.5539/jel.v6n1p102 The Advantages and Disadvantages of Using Qualitative and Quantitative Approaches and Methods in Language “ Testing and Assessment ” Research : A Literature Review . Journal of Education and Learning, 6(1):102

work page doi:10.5539/jel.v6n1p102 2016

[28] [29]

Tim Rietz and Alexander Maedche. 2021. https://doi.org/10.1145/3411764.3445591 Cody: An AI - Based System to Semi - Automate Coding for Qualitative Research . In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , CHI '21, pages 1--14, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3411764.3445591 2021

[29] [30]

Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, James Folkestad, and Marcia Moraes. 2023. Automated code extraction from discussion. In Advances in Quantitative Ethnography: 4th International Conference, ICQE 2022, Copenhagen, Denmark, October 15--19, 2022, Proceedings, page 227. Springer Nature

work page 2023

[30] [31]

Benjamin Saunders, Julius Sim, Tom Kingstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. https://doi.org/10.1007/s11135-017-0574-8 Saturation in qualitative research: exploring its conceptualization and operationalization . Quality & Quantity, 52(4):1893--1907

work page doi:10.1007/s11135-017-0574-8 2018

[31] [32]

Carson Sievert and Kenneth Shirley. 2014. Ldavis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces, pages 63--70

work page 2014

[32] [33]

Ravi Sinha, Idris Solola, Ha Nguyen, Hillary Swanson, and LuEttaMae Lawrence. 2024. https://doi.org/10.1145/3663433.3663456 The Role of Generative AI in Qualitative Research : GPT -4's Contributions to a Grounded Theory Analysis . In Proceedings of the Symposium on Learning , Design and Technology , pages 17--25, Delft Netherlands. ACM

work page doi:10.1145/3663433.3663456 2024

[33] [34]

Cesare Spinoso-Di Piano. 2023. Qualitative code suggestion: A human-centric approach to qualitative coding. McGill University (Canada)

work page 2023

[34] [35]

Anselm Strauss and Juliet Corbin. 1998. Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd ed. Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd ed., pages xiii, 312--xiii, 312. Place: Thousand Oaks, CA, US Publisher: Sage Publications, Inc

work page 1998

[35] [36]

Gemma Team. 2025 a . https://goo.gle/Gemma3Report Gemma 3

work page 2025

[36] [37]

Qwen Team. 2025 b . https://qwenlm.github.io/blog/qwq-32b/ Qwq-32b: Embracing the power of reinforcement learning

work page 2025

[37] [38]

Gareth Terry, Nikki Hayfield, Victoria Clarke, and Virginia Braun. 2017. https://books.google.com/books?hl=en&lr=&id=AAniDgAAQBAJ&oi=fnd&pg=PA17&dq=Thematic+analysis+terry+&ots=dpi2nmHiMV&sig=959tII4BUp9su6Hv2JJui1KjP5Q Thematic analysis . The SAGE handbook of qualitative research in psychology, 2(17-37):25. Publisher: SAGE Publications Ltd

work page 2017

[38] [39]

David R. Thomas. 2006. https://doi.org/10.1177/1098214005283748 A General Inductive Approach for Analyzing Qualitative Evaluation Data . American Journal of Evaluation, 27(2):237--246

work page doi:10.1177/1098214005283748 2006

[39] [40]

Anthony G. Tuckett. 2005. https://doi.org/10.5172/conu.19.1-2.75 Applying thematic analysis theory to practice: A researcher’s experience . Contemporary Nurse, 19(1-2):75--87

work page doi:10.5172/conu.19.1-2.75 2005

[40] [41]

Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer

Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. https://doi.org/10.1145/3581754.3584136 Supporting Qualitative Analysis with Large Language Models : Combining Codebook with GPT -3 for Deductive Coding . In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces , IUI '23 Companion ,...

work page doi:10.1145/3581754.3584136 2023

[41] [42]

Baker, Juhan Kim, and Nidhi Nasiar

Andres Felipe Zambrano, Xiner Liu, Amanda Barany, Ryan S. Baker, Juhan Kim, and Nidhi Nasiar. 2023. https://doi.org/10.1007/978-3-031-47014-1_32 From nCoder to ChatGPT : From Automated Coding to Refining Human Coding . In Advances in Quantitative Ethnography , Communications in Computer and Information Science , pages 470--485, Cham. Springer Nature Switzerland

work page doi:10.1007/978-3-031-47014-1_32 2023

[42] [43]

Fengxiang Zhao, Fan Yu, and Yi Shang. 2024. A new method supporting qualitative data analysis through prompt generation for inductive coding. In 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), pages 164--169. IEEE

work page 2024