arxiv: 2604.12337 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.CY

Recognition: unknown

Identifying and Mitigating Gender Cues in Academic Recommendation Letters: An Interpretability Case Study

Charlotte S. Alexander , Shane Storks , Souradip Pal , Sayak Chakrabarty , Arushi Sharma , Mlen-Too Wesley , Bailey Russo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 💻 cs.LG cs.CY

keywords gender biasletters of recommendationinterpretabilitytext classificationmachine learningacademic admissionsbias mitigationnatural language processing

0 comments

The pith

Even after names and pronouns are removed, models can predict applicant gender from recommendation letters at up to 68 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether letters of recommendation for medical residencies still reveal applicant gender once explicit markers like names and pronouns are stripped away. Classifiers built on DistilBERT, RoBERTa, and Llama 2 reach well above chance on these scrubbed texts, and interpretation tools flag recurring word choices that function as gender proxies. When those terms are removed, accuracy drops several points but remains better than random. The authors therefore argue that letters carry implicit gender signals that are difficult to eliminate and that could influence hiring or admissions decisions.

Core claim

Transformer-based models trained to classify the gender of de-gendered letters of recommendation achieve up to 68 percent accuracy, with TF-IDF and SHAP highlighting terms such as emotional and humanitarian as strong female-associated signals. Removing the most predictive of these linguistic patterns produces a measurable drop in classifier performance, yet gender prediction stays above chance. The study concludes that recommendation letters contain persistent gender-identifying cues beyond explicit identifiers.

What carries the argument

Gender-classification models (DistilBERT, RoBERTa, Llama 2) paired with TF-IDF and SHAP interpretability to surface and then excise implicit gender cues from anonymized letters.

If this is right

Recommendation letters contain gender cues that survive removal of names and pronouns and can activate downstream bias.
Targeted removal of the identified linguistic patterns reduces but does not eliminate models' ability to detect gender.
A concrete technical process for producing more gender-neutral letters is demonstrated.
Auditing the text of evaluative documents is required in addition to model-level fairness interventions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same patterns may appear in other evaluative texts such as performance reviews or grant letters.
Human readers could unconsciously respond to the identical word-choice signals during letter review.
Residual predictive power after cue removal may reflect deeper structural differences in how letters are composed for different genders.

Load-bearing premise

That the de-gendering step plus removal of the flagged terms leaves no other residual gender information and that the observed performance changes are caused by the removal of those cues rather than dataset or training artifacts.

What would settle it

A new collection of letters in which the identified gender-associated phrases have been systematically replaced or deleted, followed by retraining the same classifiers and checking whether accuracy falls to chance level.

Figures

Figures reproduced from arXiv: 2604.12337 by Arushi Sharma, Bailey Russo, Charlotte S. Alexander, Mlen-Too Wesley, Sayak Chakrabarty, Shane Storks, Souradip Pal.

**Figure 1.** Figure 1: Overview of our case-study pipeline: corpus construction, explicit de-gendering, model training, interpretabil [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Top 10 SHAP tokens with their corresponding values(+/-) for both genders (male/ female) grouped by [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: SHAP summary plot showing the most influential topics in predicting gender from recommendation letters [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

read the original abstract

Letters of recommendation (LoRs) can carry patterns of implicitly gendered language that can inadvertently influence downstream decisions, e.g. in hiring and admissions. In this work, we investigate the extent to which Transformer-based encoder models as well as Large Language Models (LLMs) can infer the gender of applicants in academic LoRs submitted to an U.S. medical-residency program after explicit identifiers like names and pronouns are de-gendered. While using three models (DistilBERT, RoBERTa, and Llama 2) to classify the gender of anonymized and de-gendered LoRs, significant gender leakage was observed as evident from up to 68% classification accuracy. Text interpretation methods, like TF-IDF and SHAP, demonstrate that certain linguistic patterns are strong proxies for gender, e.g. "emotional'' and "humanitarian'' are commonly associated with LoRs from female applicants. As an experiment in creating truly gender-neutral LoRs, these implicit gender cues were remove resulting in a drop of up to 5.5% accuracy and 2.7% macro $F_1$ score on re-training the classifiers. However, applicant gender prediction still remains better than chance. In this case study, our findings highlight that 1) LoRs contain gender-identifying cues that are hard to remove and may activate bias in decision-making and 2) while our technical framework may be a concrete step toward fairer academic and professional evaluations, future work is needed to interrogate the role that gender plays in LoR review. Taken together, our findings motivate upstream auditing of evaluative text in real-world academic letters of recommendation as a necessary complement to model-level fairness interventions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gender signals leak through standard de-gendering in medical LoRs at 68% accuracy, with cue removal giving only a small drop, but the setup lacks key controls and details.

read the letter

The main thing to know is that this paper finds gender-predictive patterns survive explicit removal of names and pronouns in U.S. medical residency letters, with DistilBERT, RoBERTa, and Llama 2 reaching up to 68% accuracy on the de-gendered text. Removing the top TF-IDF and SHAP cues drops accuracy by at most 5.5% and F1 by 2.7%, but performance stays above chance. The work applies standard interpretability tools to a fresh corpus and reports the specific numbers plus the before-and-after results on three models. That gives a concrete sense of how much signal remains after basic anonymization and what kinds of words (emotional, humanitarian) act as proxies in this domain. The intervention attempt is a useful step even if it is simple feature removal followed by retraining. The soft spots sit in the experimental controls and reporting. Dataset size, class balance, and split details are not in the abstract, and the full text would need to show them clearly for the 68% figure to land solidly. There is no random-word ablation or other control to test whether the identified cues are causal or whether any removal would produce a similar small drop. Statistical significance on the performance changes is not mentioned, and the Llama 2 classification procedure is left vague. These gaps make it harder to rule out dataset artifacts or incomplete de-gendering. The paper is aimed at people studying fairness in academic text evaluations and hiring. Readers who want a domain-specific example of leakage and a first-pass mitigation will find usable numbers here. It is worth sending to peer review because the core measurements are new for this corpus and the question is practically relevant, but the referees should ask for dataset statistics, significance tests, and at least one control ablation to tighten the causal claim about the cues.

Referee Report

4 major / 2 minor

Summary. The paper investigates implicit gender cues in de-gendered letters of recommendation (LoRs) submitted to a U.S. medical residency program. After removing explicit identifiers such as names and pronouns, the authors train DistilBERT, RoBERTa, and Llama 2 classifiers to predict applicant gender from the remaining text, reporting up to 68% accuracy. TF-IDF and SHAP are used to identify linguistic proxies for gender (e.g., 'emotional' and 'humanitarian' associated with female applicants). Removing these cues and retraining yields drops of up to 5.5% accuracy and 2.7% macro F1, yet performance remains above chance. The work concludes that gender cues in LoRs are difficult to eliminate and motivates upstream auditing of evaluative text for fairness.

Significance. If the empirical measurements are robust, the study provides concrete evidence that standard de-gendering is insufficient to remove gender signals from real-world academic letters and that interpretability tools can surface actionable proxies. This strengthens the case for auditing source documents in high-stakes domains rather than relying solely on model-level interventions, and it offers a reproducible template (TF-IDF + SHAP + targeted ablation) that other fairness audits could adapt.

major comments (4)

[Abstract / Experimental setup] Abstract and experimental setup: the reported peak accuracy of 68% and the post-removal drops (5.5% accuracy, 2.7% F1) are presented without dataset size, class balance, train/test split details, or any statistical test (e.g., binomial confidence interval or permutation test) showing the result exceeds chance. These omissions make it impossible to judge whether the leakage claim is reliable or whether the mitigation effect is meaningful.
[Methods / De-gendering] De-gendering procedure: the manuscript states that 'explicit identifiers like names and pronouns are de-gendered' but provides neither the exact token list or NER rules used nor any post-processing audit confirming that no gendered tokens remain. Without this validation, the observed classifier performance could be driven by residual explicit signals rather than the implicit linguistic cues the authors highlight.
[Results / Mitigation experiment] Cue-removal experiment: the 5.5% accuracy drop after removing TF-IDF/SHAP-selected terms is reported without a control ablation (e.g., removing an equal number of randomly chosen words of matched frequency or using a different feature-selection method). Consequently, it is unclear whether the performance reduction is causally attributable to the identified gender proxies or to nonspecific signal loss.
[Methods / Model details] Llama 2 classification: the procedure for obtaining gender predictions from Llama 2 (prompt template, decoding strategy, output parsing, or fine-tuning details) is not described, preventing assessment of whether the 68% figure is comparable to the encoder-only models or reproducible.

minor comments (2)

[Abstract] The abstract states 'these implicit gender cues were remove resulting in' – the verb form should be corrected to 'were removed, resulting in'.
[Abstract] It would be helpful to state explicitly which of the three models achieves the 68% accuracy and which achieves the largest post-mitigation drop.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript investigating implicit gender cues in de-gendered letters of recommendation. We address each major comment point by point below, indicating where we will revise the manuscript to improve transparency, reproducibility, and rigor.

read point-by-point responses

Referee: [Abstract / Experimental setup] Abstract and experimental setup: the reported peak accuracy of 68% and the post-removal drops (5.5% accuracy, 2.7% F1) are presented without dataset size, class balance, train/test split details, or any statistical test (e.g., binomial confidence interval or permutation test) showing the result exceeds chance. These omissions make it impossible to judge whether the leakage claim is reliable or whether the mitigation effect is meaningful.

Authors: We agree that these details are necessary for assessing reliability. We will revise the abstract and experimental setup section to explicitly report the dataset size, class balance, and train/test split. We will also add binomial confidence intervals for the accuracy and F1 scores along with a permutation test to statistically confirm that performance exceeds chance. These changes will be incorporated in the next version. revision: yes
Referee: [Methods / De-gendering] De-gendering procedure: the manuscript states that 'explicit identifiers like names and pronouns are de-gendered' but provides neither the exact token list or NER rules used nor any post-processing audit confirming that no gendered tokens remain. Without this validation, the observed classifier performance could be driven by residual explicit signals rather than the implicit linguistic cues the authors highlight.

Authors: We agree that greater detail on the de-gendering procedure is required to rule out residual explicit signals. We will expand the Methods section to include the complete list of tokens and pronouns replaced, the NER rules employed, and the results of a post-processing audit on a sample of letters confirming the absence of gendered identifiers. These additions will be made in the revision. revision: yes
Referee: [Results / Mitigation experiment] Cue-removal experiment: the 5.5% accuracy drop after removing TF-IDF/SHAP-selected terms is reported without a control ablation (e.g., removing an equal number of randomly chosen words of matched frequency or using a different feature-selection method). Consequently, it is unclear whether the performance reduction is causally attributable to the identified gender proxies or to nonspecific signal loss.

Authors: The referee is correct that a control ablation is absent. Although the terms were selected via targeted interpretability methods (TF-IDF and SHAP) specifically linked to gender prediction, we acknowledge that a random-removal baseline would strengthen the causal interpretation. We will add such a control experiment—removing an equal number of frequency-matched random words—and report the comparative results in the revised Results section. revision: yes
Referee: [Methods / Model details] Llama 2 classification: the procedure for obtaining gender predictions from Llama 2 (prompt template, decoding strategy, output parsing, or fine-tuning details) is not described, preventing assessment of whether the 68% figure is comparable to the encoder-only models or reproducible.

Authors: We agree that the Llama 2 procedure must be fully specified for reproducibility and comparability. We will revise the Methods section to include the exact zero-shot prompt template, decoding strategy (greedy), output parsing rules, and confirmation that no fine-tuning was performed. These details will be added in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements on held-out data

full rationale

The paper's core pipeline consists of training off-the-shelf classifiers (DistilBERT, RoBERTa, Llama 2) on explicitly de-gendered letters, measuring accuracy on held-out test letters, applying post-hoc TF-IDF and SHAP to surface features, removing those features, and re-training to observe the accuracy drop. These quantities are measured outcomes on independent data splits and do not reduce by construction to quantities defined from the same fitted parameters or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text; the claims rest on observable classification performance rather than tautological renaming or fitting.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that de-gendering removes all explicit signals and that observed accuracy reflects genuine implicit cues rather than dataset artifacts or model biases; no new physical or mathematical entities are introduced.

free parameters (2)

Cue selection threshold
Words flagged by SHAP/TF-IDF for removal are chosen based on importance scores whose cutoff is not specified in the abstract.
Model training hyperparameters
Standard fine-tuning parameters for DistilBERT, RoBERTa, and Llama 2 are used but not enumerated.

axioms (2)

domain assumption De-gendered letters contain no residual explicit gender identifiers
Invoked when claiming that classification relies solely on implicit cues after name/pronoun removal.
domain assumption Classification accuracy above chance indicates actionable gender leakage
Links model performance directly to real-world bias risk without additional validation.

pith-pipeline@v0.9.0 · 5636 in / 1499 out tokens · 35114 ms · 2026-05-10T15:55:24.651108+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 4 canonical work pages · 3 internal anchors

[1]

Insight - Amazon scraps secret AI recruiting tool that showed bias against women.Reuters, October 2018

Jeffrey Dastin. Insight - Amazon scraps secret AI recruiting tool that showed bias against women.Reuters, October 2018

2018
[2]

Text mining for bias: A recommendation letter experiment.American Business Law Journal, 59(1):5–59, 2022

Charlotte S Alexander. Text mining for bias: A recommendation letter experiment.American Business Law Journal, 59(1):5–59, 2022

2022
[3]

Lindsay Rice and Joan M. Barth. Hiring decisions: The effect of evaluator gender and gender stereotype characteristics on the evaluation of job applicants.Gender Issues, 33:1–21, 2016

2016
[4]

Interventions that affect gender bias in hiring: A systematic review

Carol Isaac, Barbara Lee, and Molly L Carnes. Interventions that affect gender bias in hiring: A systematic review. Academic Medicine, 84:1440–1446, 2009

2009
[5]

A meta-analysis of gender stereotypes and bias in experimental simulations of employment decision making.Journal of applied psychology, 100(1):128, 2015

Amanda J Koch, Susan D D’Mello, and Paul R Sackett. A meta-analysis of gender stereotypes and bias in experimental simulations of employment decision making.Journal of applied psychology, 100(1):128, 2015

2015
[6]

Powerless men and agentic women: Gender bias in hiring decisions.Sex Roles, 80(11):667–680, 2019

Ann E Hoover, Tay Hack, Amber L Garcia, Wind Goodfriend, and Meara M Habashi. Powerless men and agentic women: Gender bias in hiring decisions.Sex Roles, 80(11):667–680, 2019

2019
[7]

decoy effect

Steffen Keck and Wenjie Tang. When “decoy effect” meets gender bias: The role of choice set composition in hiring decisions.Journal of Behavioral Decision Making, 33(2):240–254, 2020

2020
[8]

Investigating the effects of gender bias on github

Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson Murphy-Hill. Investigating the effects of gender bias on github. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 700–711. IEEE, 2019

2019
[9]

Utilizing data driven methods to identify gender bias in linkedin profiles.Information Processing & Management, 60(5):103423, 2023

Vivian Simon, Neta Rabin, and Hila Chalutz-Ben Gal. Utilizing data driven methods to identify gender bias in linkedin profiles.Information Processing & Management, 60(5):103423, 2023

2023
[10]

Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, and Noah A. Smith. The effect of different writing tasks on linguistic style: A case study of the ROC story cloze task. In Roger Levy and Lucia Specia, editors,Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 15–25, Vancouver, Canada, Aug...

2017
[11]

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol...

2018
[12]

Hypothesis only baselines in natural language inference

Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Benjamin Van Durme. Hypothesis only baselines in natural language inference. In Malvina Nissim, Jonathan Berant, and Alessandro Lenci, editors, Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 180–191, New Orleans, Louisiana, June 2018. Associati...

2018
[13]

Probing neural network comprehension of natural language arguments

Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments. In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4658–4664, Florence, Italy, July 2019. Association for Computational Linguistics

2019
[14]

Assessing group-level gender bias in professional evaluations: The case of medical student end-of-shift feedback

Emmy Liu, Michael Henry Tessler, Nicole Dubosh, Katherine Hiller, and Roger Levy. Assessing group-level gender bias in professional evaluations: The case of medical student end-of-shift feedback. In Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, and Hila Gonen, editors,Proceedings of the 4th Workshop on Gender Bias in Natur...
[15]

Association for Computational Linguistics
[16]

Man is to computer programmer as woman is to homemaker? debiasing word embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 4356–4364, Red Hook, NY , USA, 2016. Curran Associates Inc

2016
[17]

Gender bias in contextualized word embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. Gender bias in contextualized word embeddings. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long...

2019
[18]

Gender bias in coreference resolution

Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. Gender bias in coreference resolution. In Marilyn Walker, Heng Ji, and Amanda Stent, editors,Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orlea...

2018
[19]

Bias in bios: A case study of semantic representation bias in a high-stakes setting

Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. Bias in bios: A case study of semantic representation bias in a high-stakes setting. InProceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 120–128, New...

2019
[20]

Gender Bias in BERT - Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task, 2022

Sophie Jentzsch and Cigdem Turan. Gender Bias in BERT - Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task, 2022

2022
[21]

Gender bias and stereotypes in large language models

Hadas Kotek, Rikker Dockum, and David Sun. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, CI ’23, page 12–24, New York, NY , USA, 2023. Association for Computing Machinery

2023
[22]

Kelly is a Warm Person, Joseph is a Role Model

Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. “Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, Singapore, December 2023. Associa...

2023
[23]

Queer people are people first: Deconstructing sexual identity stereotypes in large language models, 2023

Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, and Emma Strubell. Queer people are people first: Deconstructing sexual identity stereotypes in large language models, 2023

2023
[24]

ReadmeReady: Free and Customizable Code Documentation with LLMs-A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

Sayak Chakrabarty and Souradip Pal. ReadmeReady: Free and Customizable Code Documentation with LLMs-A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

2025
[25]

Investigating gender bias in large language models through text generation

Shweta Soundararajan and Sarah Jane Delany. Investigating gender bias in large language models through text generation. In Mourad Abbas and Abed Alhakim Freihat, editors,Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024), pages 410–424, Trento, October 2024. Association for Computational Linguistics

2024
[26]

Investigating ableism in LLMs through multi-turn conversation

Guojun Wu and Sarah Ebling. Investigating ableism in LLMs through multi-turn conversation. In Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, and Jieyu Zhao, editors,Proceedings of the Third Workshop on NLP for Positive Impact, pages 202–210, Miami, Florida, USA, November 2024. Association for Comp...

2024
[27]

Quantifying gender bias in large language models using information-theoretic and statistical analysis.Information, 16(5), 2025

Imran Mirza, Akbar Anbar Jafari, Cagri Ozcinar, and Gholamreza Anbarjafari. Quantifying gender bias in large language models using information-theoretic and statistical analysis.Information, 16(5), 2025

2025
[28]

Louis Hickman, Patrick D Dunlop, and Jasper Leo Wolf. The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high-stakes testing.International Journal of Selection and Assessment, 32(4):499–511, 2024. 11 APREPRINT- APRIL15, 2026

2024
[29]

Hacking the perfect score on high-stakes personality assessments with generative ai

Jane Phillips and Chet Robie. Hacking the perfect score on high-stakes personality assessments with generative ai. Personality and Individual Differences, 231:112840, 2024

2024
[30]

Can large language models make the grade? an empirical study evaluating llms ability to mark short answer questions in k-12 education

Owen Henkel, Libby Hills, Adam Boxer, Bill Roberts, and Zach Levonian. Can large language models make the grade? an empirical study evaluating llms ability to mark short answer questions in k-12 education. InProceedings of the Eleventh ACM Conference on Learning@ Scale, pages 300–304, 2024

2024
[31]

Combining generative and discriminative ai for high-stakes interview practice

Chee Wee Leong, Navaneeth Jawahar, Vinay Basheerabad, Torsten Wörtwein, Andrew Emerson, and Guy Sivan. Combining generative and discriminative ai for high-stakes interview practice. InCompanion Proceedings of the 26th International Conference on Multimodal Interaction, pages 94–96, 2024

2024
[32]

Fairness Testing of Large Language Models in Role-Playing

Xinyue Li, Zhenpeng Chen, Jie M Zhang, Yiling Lou, Tianlin Li, Weisong Sun, Yang Liu, and Xuanzhe Liu. Benchmarking bias in large language models during role-playing.arXiv preprint arXiv:2411.00585, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. Llms-as-judges: a comprehensive survey on llm-based evaluation methods.arXiv preprint arXiv:2412.05579, 2024

work page internal anchor Pith review arXiv 2024
[34]

JobFair: A framework for benchmarking gender hiring bias in large language models

Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin, and Maria Perez-Ortiz. JobFair: A framework for benchmarking gender hiring bias in large language models. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Lin- guistics: EMNLP 2024, pages 3227–...

2024
[35]

MM-PoE: Multiple Choice Reasoning via

Sayak Chakrabarty and Souradip Pal. MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models.Journal of Open Source Software, 10(108):7783, 2025

2025
[36]

Robustly improving llm fairness in realistic settings via interpretability, 2025

Adam Karvonen and Samuel Marks. Robustly improving llm fairness in realistic settings via interpretability, 2025

2025
[37]

Fairness in ai-driven recruitment: Challenges, metrics, methods, and future directions.arXiv preprint arXiv:2405.19699, 2024

Dena F Mujtaba and Nihar R Mahapatra. Fairness in ai-driven recruitment: Challenges, metrics, methods, and future directions.arXiv preprint arXiv:2405.19699, 2024

work page arXiv 2024
[38]

Gender-based language differences in letters of recommendation.AMIA Summits on Translational Science Proceedings, 2023:196, 2023

Sunyang Fu, Darren Q Calley, Veronica A Rasmussen, Marissa D Hamilton, Christopher K Lee, Austin Kalla, and Hongfang Liu. Gender-based language differences in letters of recommendation.AMIA Summits on Translational Science Proceedings, 2023:196, 2023

2023
[39]

Vikram Vasan, Christopher P Cheng, Caleb J Fan, David K Lerner, Karen Pascual, Alfred Marc Iloreta, Seilesh C Babu, and Maura K Cosetti. Gender differences in letters of recommendations and personal statements for neurotology fellowship over 10 years: A deep learning linguistic analysis.Otology & Neurotology, 45(8):827–832, 2024

2024
[40]

Vikram Vasan, Christopher P Cheng, Shaun Edalati, Shreya Mandloi, David K Lerner, Anthony Del Signore, Madeleine Schaberg, Satish Govindaraj, Mindy Rabinowitz, Gurston Nyquist, et al. Gender-based linguistic differences in letters of recommendation for rhinology fellowship over time: A dual-institutional follow-up study using natural language processing a...

2024
[41]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[42]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2020

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2020

2020
[43]

RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019

2019
[44]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Represen- tations, 2022

2022
[46]

A unified approach to interpreting model predictions

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[47]

Learning Gender-Neutral Word Embeddings, 2018

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. Learning Gender-Neutral Word Embeddings, 2018. 12 APREPRINT- APRIL15, 2026

2018
[48]

eradication of difference

Eleanor Drage and Kerry Mackereth. Does ai debias recruitment? race, gender, and ai’s “eradication of difference”. Philosophy & technology, 35(4):89, 2022

2022
[49]

Disability, fairness, and algorithmic bias in ai recruitment.Ethics and Inf

Nicholas Tilmes. Disability, fairness, and algorithmic bias in ai recruitment.Ethics and Inf. Technol., 24(2), June 2022

2022
[50]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure, 2022

Maarten Grootendorst. BERTopic: Neural topic modeling with a class-based TF-IDF procedure, 2022

2022
[51]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. Transformers: State-of-the-art natural language processing. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, O...

2020
[52]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

2019
[53]

male", while negative values indicate a push toward

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in python.Journal of machine learning research...

2011