pith. sign in

arxiv: 2606.07969 · v1 · pith:ZHVBENCInew · submitted 2026-06-06 · 💻 cs.CL · cs.AI

Neutrality Bites: Gender Representation in AI-Generated Animal Stories

Pith reviewed 2026-06-27 20:11 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords gender biaslarge language modelsanimal storiesneutralitymasculine biasAI-generated narrativesidentity erasure
0
0 comments X

The pith

Large language models assign masculine gender to animal characters far more often than feminine when they assign gender at all.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests six leading LLMs by prompting them to complete stories about seven anthropomorphic animals whose gender is left unstated. Models often avoid assigning any gender or use neutral pronouns, but when they do specify gender the results show masculine characters in 40.6 percent of stories and feminine characters in only 2.2 percent. The pattern holds across four narrative settings and a range of temperatures. The authors conclude that efforts to achieve neutrality can produce the erasure of feminine and other marginalized identities rather than balanced representation. They argue that new approaches are needed to distribute social possibilities more evenly across imagined subjects.

Core claim

Across 23.8K stories, models avoid gendering the animal character in 19 percent of cases on average and use gender-neutral language in 38.2 percent, yet assign masculine gender in 40.6 percent and feminine gender in 2.2 percent when gender is assigned; the authors therefore claim that neutrality in LLMs contributes to the erasure of marginalized perspectives and identities.

What carries the argument

Prompting LLMs to complete English-language stories about anthropomorphic animals with unstated gender, then measuring the resulting patterns of gender assignment versus neutrality.

Load-bearing premise

The six chosen LLMs, seven animals, four settings, and temperature values are representative enough to support general claims about how LLMs assign gender in ambiguous narrative contexts.

What would settle it

A larger study using additional models or prompts that finds feminine animal characters appearing at rates equal to or higher than masculine ones when gender is assigned.

Figures

Figures reproduced from arXiv: 2606.07969 by Imani Finkley, Melanie Walsh, Yuanxi Li.

Figure 1
Figure 1. Figure 1: Example prompt and response from GPT-4o with the following parameters: bear (subject), farm (setting), and 1.0 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Gender distribution results from 1327 human survey respondents given a story-completion task with varying [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Gender assignment distribution results from 23.8K text outputs from six LLMs given a story-completion task with [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Gender assignment distribution results when gender-neutral representation is not considered, highlighting a significant [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Gender assignment distribution across narrative settings for generated stories. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Gender assignment distribution across temperatures for generated stories. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
read the original abstract

Gender bias in AI-generated stories is a well-documented problem. While much attention has been paid to reducing or mitigating this bias, it is not always clear whether interventions produce genuinely fairer results. To investigate this issue, we examine how large language models (LLMs) handle gender assignment in a narrative context that is popular, highly ambiguous, and also known to closely reproduce human stereotypes: stories about talking animals. We prompt six leading LLMs to complete an English-language story about seven different anthropomorphic animal characters whose gender is unstated. We additionally iterate with four different narrative settings and a range of model temperatures. Across the 23.8K stories, we find that models frequently avoid gendering the animal character in the story (19% on average) or use gender-neutral language like "it" or "its" (38.2% on average). However, when gender is assigned, there is a significant masculine bias. Feminine animal characters are virtually absent, present in just 2.2% of stories vs. 40.6% that feature masculine characters. Our findings point to a broader argument: neutrality bites. In other words, models that prioritize neutrality to address social bias may actually contribute to the erasure of marginalized perspectives and identities. We suggest that alternative strategies beyond neutrality need to be pursued, such as ones that more equally distribute social possibilities across imagined subjects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that LLMs prompted to generate stories about seven anthropomorphic animals with unstated gender (across six models, four narrative settings, and varied temperatures, yielding 23.8K stories) frequently avoid explicit gendering (19% average) or use neutral pronouns (38.2% average). When gender is assigned, a strong masculine bias appears (40.6% masculine characters vs. 2.2% feminine). The authors argue this shows that neutrality-focused approaches can erase marginalized identities and call for alternative strategies.

Significance. If the gender classification procedure proves reliable, the scale of the empirical counts provides concrete evidence that LLMs default to masculine or neutral representations in ambiguous narrative contexts, supporting the broader claim that neutrality can contribute to underrepresentation. The direct generation of 23.8K stories without fitted parameters is a methodological strength for reproducibility.

major comments (3)
  1. [Methodology] Methodology section: The procedure for classifying generated stories as masculine, feminine, neutral, or ungendered is not described, including any annotation guidelines, inter-annotator agreement scores, validation against human judgments, or handling of ambiguous cases. This directly affects the reliability of the headline 2.2% and 40.6% figures.
  2. [Results] Results section: No per-model, per-animal, or per-setting breakdowns or variance measures are reported for the gender assignment rates. Without these, it is impossible to determine whether the masculine bias generalizes beyond the specific choice of six LLMs and seven animals or is driven by particular combinations.
  3. [Abstract] Abstract and Results: The reported percentages lack error bars, confidence intervals, or robustness checks across temperature values, leaving the central claim of 'significant masculine bias' vulnerable to unexamined measurement and sampling choices.
minor comments (1)
  1. [Abstract] The abstract states the total story count but provides no details on the exact prompting template or how stories were deduplicated or filtered.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for improving the clarity and robustness of our manuscript. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Methodology] Methodology section: The procedure for classifying generated stories as masculine, feminine, neutral, or ungendered is not described, including any annotation guidelines, inter-annotator agreement scores, validation against human judgments, or handling of ambiguous cases. This directly affects the reliability of the headline 2.2% and 40.6% figures.

    Authors: We agree that the classification procedure must be described in detail for the results to be interpretable. The current version of the manuscript does not include this information. In the revision, we will add a dedicated subsection to the methodology that specifies the exact rules for classifying stories (based on pronoun usage and explicit gender markers), how ambiguous cases were resolved, the annotation guidelines provided to any human coders, inter-annotator agreement statistics, and the results of a validation exercise against human judgments on a held-out sample. revision: yes

  2. Referee: [Results] Results section: No per-model, per-animal, or per-setting breakdowns or variance measures are reported for the gender assignment rates. Without these, it is impossible to determine whether the masculine bias generalizes beyond the specific choice of six LLMs and seven animals or is driven by particular combinations.

    Authors: The referee is correct that aggregate figures alone are insufficient to assess generalizability. We will revise the results section to include tables or supplementary figures with per-model, per-animal, and per-narrative-setting breakdowns of the gender assignment rates. We will also report variance measures (e.g., standard deviations across temperature settings) to allow readers to evaluate consistency. revision: yes

  3. Referee: [Abstract] Abstract and Results: The reported percentages lack error bars, confidence intervals, or robustness checks across temperature values, leaving the central claim of 'significant masculine bias' vulnerable to unexamined measurement and sampling choices.

    Authors: We accept this criticism. The revised manuscript will include error bars or confidence intervals for the key percentages in both the abstract and results. We will also add a short robustness subsection that reports gender assignment rates across the range of temperature values used, confirming that the masculine bias pattern is stable. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical counts from generated text

full rationale

The paper performs an empirical study by prompting six LLMs to generate stories about seven animals in four settings, then manually or automatically classifying the 23.8K outputs for gender assignment (masculine, feminine, neutral, or avoided). The headline percentages (40.6% masculine, 2.2% feminine when gendered) are simple frequency counts of observed tokens and pronouns in the generated text; no equations, fitted parameters, predictions, or self-citations are used to derive these figures from the inputs. The central claim is therefore a direct report of the experimental data rather than a reduction of the data to itself by construction. The selection of models/animals/settings is a methodological choice whose representativeness can be debated on external grounds, but it does not create a circular derivation chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on the representativeness of the selected models and prompts plus the validity of automated or manual gender classification from generated text; no free parameters are fitted to the outcome data.

free parameters (2)
  • temperature values
    Chosen by hand to explore generation diversity; not fitted to the gender statistics.
  • selection of seven animal characters
    Chosen to ensure gender ambiguity; specific choice could influence observed bias rates.
axioms (2)
  • domain assumption LLMs produce text from which gender references can be reliably extracted and counted
    Required to convert raw generations into the reported 2.2% and 40.6% figures.
  • domain assumption The tested models and settings are representative of typical LLM narrative behavior
    Invoked when generalizing from the 23.8K stories to broader claims about AI gender handling.

pith-pipeline@v0.9.1-grok · 5777 in / 1299 out tokens · 34875 ms · 2026-06-27T20:11:20.819339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 30 canonical work pages

  1. [1]

    Amin Abolghasemi, Leif Azzopardi, Arian Askari, Maarten de Rijke, and Suzan Verberne. 2024. Measuring Bias in a Ranked List Using Term-Based Representations. InAdvances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part V(Glasgow, United Kingdom). Springer-Verlag, Berli...

  2. [2]

    Anjali Adukia, Alex Eble, Emileigh Harrison, Hakizumwami Birali Runesha, and Teodora Szasz. 2023. What We Teach About Race and Gender: Representation in Images and Text of Children’s Books*.The Quarterly Journal of Economics138, 4 (Nov. 2023), 2225–2285. doi:10.1093/qje/qjad028

  3. [3]

    Anthropic. 2025. System Card: Claude Sonnet 4.5. https://assets.anthropic.com/m/12f214efcc2f457a/original/Claude-Sonnet-4-5-System- Card.pdf

  4. [4]

    Stuart G. Baker. 1994. The Multinomial-Poisson Transformation.Journal of the Royal Statistical Society. Series D (The Statistician)43, 4 (1994), 495–504. http://www.jstor.org/stable/2348134

  5. [5]

    and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. doi:10.114...

  6. [6]

    Taylor Berry and Julia Wilkins. 2017. The Gendered Portrayal of Inanimate Characters in Children’s Books. 43, 2 (2017). FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Imani Finkley, Yuanxi Li, and Melanie Walsh

  7. [7]

    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) Is Power: A Critical Survey of “Bias” in NLP. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 5454...

  8. [8]

    Yang Trista Cao and Hal Daumé. 2021. Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle.Computational Linguistics47, 3 (Nov. 2021), 615–661. doi:10.1162/coli_a_00413

  9. [9]

    Sarah Caré. 2024. Female Animal Characters in Roald Dahl’s Children’s Books: A Misogynistic Portrayal.Miscelánea: A Journal of English and American Studies69 (June 2024), 111–130. doi:10.26754/ojs_misc/mj.20248807

  10. [10]

    Kennedy Casey, Kylee Novick, and Stella Lourenco. 2021. Sixty Years of Gender Representation in Children’s Books: Conditions Associated with Overrepresentation of Male versus Female Protagonists.PLOS ONE16 (Dec. 2021), e0260566. doi:10.1371/journal.pone.0260566

  11. [11]

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al . 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)

  12. [12]

    Hannah Devinney, Jenny Björklund, and Henrik Björklund. 2022. Theories of “Gender” in NLP Bias Research. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 2083–2102. doi:10.1145/3531146.3534627

  13. [13]

    Dimgba, Sharon Oba, Ameeta Agrawal, and Philippe J

    Martha O. Dimgba, Sharon Oba, Ameeta Agrawal, and Philippe J. Giabbanelli. 2025. Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations. arXiv:2509.04515 [cs] doi:10.48550/arXiv.2509.04515

  14. [14]

    Bufan Gao and Elisa Kreiss. 2025. Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, Ch...

  15. [15]

    Somayeh Ghanbarzadeh, Yan Huang, Hamid Palangi, Radames Cruz Moreno, and Hamed Khanpour. 2023. Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models. InFindings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto...

  16. [16]

    Tarleton Gillespie. 2024. Generative AI and the Politics of Visibility.Big Data & Society11, 2 (June 2024), 20539517241252131. doi:10.1177/20539517241252131

  17. [17]

    Google. [n. d.]. Gemini Storybook — for the Stories Only You Could Imagine. https://gemini.google/overview/storybook/

  18. [18]

    Liz Grauerholz and Bernice Pescosolido. 1989. Gender Representation in Children’s Literature: 1900-1984.Gender & Society - GENDER SOC3 (March 1989), 113–125. doi:10.1177/089124389003001008

  19. [19]

    Zhiting He, Jiayi Su, Li Chen, Tianqi Wang, and Ray Lc. 2025. ’I Recall the Past’: Exploring How People Collaborate with Generative AI to Create Cultural Heritage Narratives.Proc. ACM Hum.-Comput. Interact.9, 2, Article CSCW108 (May 2025), 30 pages. doi:10.1145/3711006

  20. [20]

    The Mouse Looks Like a Boy

    Thomas M. Hill and Katrina Bartow Jacobs. 2020. “The Mouse Looks Like a Boy”: Young Children’s Talk About Gender Across Human and Nonhuman Characters in Picture Books.Early Childhood Education Journal48, 1 (Jan. 2020), 93–102. doi:10.1007/s10643-019-00969-x

  21. [21]

    Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure.Scandinavian Journal of Statistics6, 2 (1979), 65–70. http://www.jstor.org/stable/4615733

  22. [22]

    Tamanna Hossain, Sunipa Dev, and Sameer Singh. 2023. MISGENDERED: Limits of Large Language Models in Understanding Pronouns. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canad...

  23. [23]

    Ting-Yao Hsu, Yen-Chia Hsu, and Ting-Hao (Kenneth) Huang. 2019. On How Users Edit Computer-Generated Visual Stories. InExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/3290607.3312965

  24. [24]

    Hadas Kotek, Rikker Dockum, and David Sun. 2023. Gender bias and stereotypes in Large Language Models. InProceedings of The ACM Collective Intelligence Conference(Delft, Netherlands)(CI ’23). Association for Computing Machinery, New York, NY, USA, 12–24. doi:10.1145/3582269.3615599

  25. [25]

    Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. InCHI Conference on Human Factors in Computing Systems (CHI ’22). ACM, 1–19. doi:10.1145/3491102.3502030

  26. [26]

    Molly Lewis, Matt Cooper Borkenhagen, Ellen Converse, Gary Lupyan, and Mark S Seidenberg. 2021. What Might Books Be Teaching Young Children About Gender? (2021)

  27. [27]

    Li Lucy and David Bamman. 2021. Gender and Representation Bias in GPT-3 Generated Stories. InProceedings of the Third Workshop on Narrative Understanding, Nader Akoury, Faeze Brahman, Snigdha Chaturvedi, Elizabeth Clark, Mohit Iyyer, and Lara J. Martin (Eds.). Association for Computational Linguistics, Virtual, 48–55. doi:10.18653/v1/2021.nuse-1.5

  28. [28]

    Janice McCabe, Emily Fairchild, Liz Grauerholz, Bernice Pescosolido, and Daniel Tope. 2011. Gender in Twentieth-Century Children’s Books.Gender & Society - GENDER SOC25 (March 2011), 197–226. doi:10.1177/0891243211398358 Neutrality Bites: Gender Representation in AI-Generated Animal Stories FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  29. [29]

    Jennifer Mickel, Maria De-Arteaga, Liu Leqi, and Kevin Tian. 2026. More of the Same: Persistent Representational Harms Under Increased Representation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id= R9k13fTGP0

  30. [30]

    MistralAI. 2025. Mistral Medium 3.1 - Mistral AI. https://docs.mistral.ai/models/mistral-medium-3-1-25-08

  31. [31]

    Joao Neves, Inês Costa, Joao Oliveira, Bruno Silva, and Joana Maia. 2023. Can Gender Nouns Influence the Stereotypes of Animals? Animals : an Open Access Journal from MDPI13, 16 (Aug. 2023), 2604. doi:10.3390/ani13162604

  32. [32]

    Team Olmo, :, Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, David Graham, David Heineman, Dirk Groeneveld, Faeze Brahman, Finbarr Timbers, Hamish Ivison, Jacob Morrison, Jake Poznanski, Kyle Lo, Luca Soldaini, Matt Jordan, Mayee Chen, Michael Noukhovitch, Nathan Lambert, Pete Walsh, Pradeep Dasigi, Robert Berry, Saumya Malik, Saurabh Shah, Scott Geng, S...

  33. [33]

    arXiv:2512.13961 [cs.CL] https://arxiv.org/abs/2512.13961

    Olmo 3. arXiv:2512.13961 [cs.CL] https://arxiv.org/abs/2512.13961

  34. [34]

    OpenAI. 2024. GPT-4o System Card. https://cdn.openai.com/gpt-4o-system-card.pdf

  35. [35]

    OpenAI. 2025. GPT-5 System Card. https://cdn.openai.com/gpt-5-system-card.pdf

  36. [36]

    Joanne O’Sullivan. 2025. Google Launches Personalized Gemini Storybook App to Industry Concern. https://www.publishersweekly. com/pw/by-topic/childrens/childrens-industry-news/article/98452-google-launches-personalized-gemini-storybook-app-to- industry-concern.html

  37. [37]

    Shon Otmazgin, Arie Cattan, and Yoav Goldberg. 2022. F-coref: Fast, Accurate and Easy to Use Coreference Resolution. InProceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations, Wray Buntine and Maria Liaka...

  38. [38]

    Ekstrand

    Amifa Raj and Michael D. Ekstrand. 2022. Fire Dragon and Unicorn Princess; Gender Stereotypes and Children’s Products in Search Engine Responses. arXiv. doi:10.48550/ARXIV.2206.13747 Version Number: 1

  39. [39]

    2022.Alice and Sparkle

    Ammaar Reshi. 2022.Alice and Sparkle. Independently published. Text generated using ChatGPT; Illustrations generated using Midjourney

  40. [40]

    Donya Rooein, Vilém Zouhar, Debora Nozza, and Dirk Hovy. 2025. Biased Tales: Cultural and Topic Bias in Generating Children’s Stories. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Su...

  41. [41]

    Shirin Seyedsalehi. 2025. Mitigating Gender Bias in Information Retrieval Systems. InAdvances in Information Retrieval, Claudia Hauff, Craig Macdonald, Dietmar Jannach, Gabriella Kazai, Franco Maria Nardini, Fabio Pinelli, Fabrizio Silvestri, and Nicola Tonellotto (Eds.). Vol. 15576. Springer Nature Switzerland, Cham, 227–232. doi:10.1007/978-3-031-88720-...

  42. [42]

    Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The Woman Worked as a Babysitter: On Biases in Language Generation. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, ...

  43. [43]

    Sugimoto, and Thema Monroe-White

    Evan Shieh, Faye Marie Vassel, Cassidy R. Sugimoto, and Thema Monroe-White. 2025. Laissez-Faire Harms: Algorithmic Biases in Generative Language Models (Extended Abstract).Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 3 (Oct. 2025), 2373–2374. doi:10.1609/aies.v8i3.36722

  44. [44]

    Laura Spillner. 2024. Unexpected Gender Stereotypes in AI-generated Stories: Hairdressers Are Female, but so Are Doctors. InProceedings of the Text2Story’24 Workshop. Glasgow, Scotland. https://ceur-ws.org/Vol-3671/paper10.pdf

  45. [45]

    Goya van Boven, Yupei Du, and Dong Nguyen. 2024. Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). Association for Computing Machinery, New York, NY, USA, 2470–2483. doi:10.1145/3630106.3659049

  46. [46]

    Melanie Walsh, Russell Samora, Michelle Pera-McGhee, and Jan Diehm. 2025. Bears Will Be Boys: A data analysis of animal gender in children’s books. https://pudding.cool/2025/07/kids-books/.The Pudding(July 2025)

  47. [47]

    Jules Watson, Xi Wang, Raymond Liu, Suzanne Stevenson, and Barend Beekhuizen. 2025. Analyzing values about gendered language reform in LLMs’ revisions. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computationa...

  48. [48]

    Jennie Yabroff. 2016. Why Are There so Few Girls in Children’s Books?The Washington Post(Jan. 2016). FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Imani Finkley, Yuanxi Li, and Melanie Walsh

  49. [49]

    Foulds, and Shimei Pan

    Tao Zhang, Ziqian Zeng, YuxiangXiao YuxiangXiao, Huiping Zhuang, Cen Chen, James R. Foulds, and Shimei Pan. 2025. GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Sh...