arxiv: 2605.07622 · v1 · submitted 2026-05-08 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

Chiara Manna, Eva Vanmassenhove, Jonas Klein

Pith reviewed 2026-05-11 02:09 UTC · model grok-4.3

classification 💻 cs.CL

keywords gender biasDutch BERTcontextual embeddingsstereotypesprofessionsmale defaultmorphological gender

0 comments

The pith

Dutch BERT fails to update gender representations when explicit cues contradict stereotypes in short sentences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks how gender information emerges during training of a Dutch BERT model built from scratch and tests whether clear contextual signals can override statistical patterns. Using sentence templates such as 'She is a plumber,' it checks if the model adjusts its internal representations when the cue points against typical associations. The results indicate that gender becomes linearly separable around epoch 20 yet the model still favors stereotypical male interpretations for professions and defaults generics to male even with explicit female referents. This matters because it shows that contextualization along the gender direction stays limited in a language with overt morphological gender marking. Readers should care because many applications rely on these models to handle context accurately for fair output.

Core claim

Although gender becomes clearly linearly separable around epoch 20 and is distributed across multiple embedding dimensions, the model struggles to update its internal gender representation in light of explicit contextual cues in short sentence templates. Stereotypical gender-profession pairings are predicted far more accurately than anti-stereotypical ones, and generic forms in Dutch systematically default to a male interpretation, even when the context explicitly denotes a female referent.

What carries the argument

Dynamic gender subspaces built with linear SVMs on contextual embeddings extracted at each training checkpoint, paired with controlled sentence templates that contrast explicit gender cues against learned statistical associations.

If this is right

Gender information becomes linearly encoded during training but does not translate into flexible use of that information in context.
Explicit female cues in anti-stereotypical contexts fail to shift the model's profession-gender predictions reliably.
Generic masculine forms continue to trigger male-default interpretations despite surrounding female referents.
Contextualization along the probed gender direction remains insufficiently dynamic throughout training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same limited contextual flexibility may appear in other morphologically gendered languages when tested with comparable short templates.
Training procedures that emphasize longer contexts could reduce the observed male-default persistence.
Debiasing efforts might need to target how subspaces evolve during training rather than only post-training adjustments.

Load-bearing premise

Short controlled sentence templates provide a sufficient test of whether the model can integrate explicit contextual cues to override statistical gender associations.

What would settle it

The model correctly representing anti-stereotypical gender cues when the same professions appear in longer, more natural sentences instead of the short templates.

Figures

Figures reproduced from arXiv: 2605.07622 by Chiara Manna, Eva Vanmassenhove, Jonas Klein.

**Figure 1.** Figure 1: Methodological pipeline of the gender subspace construction and the bias evaluation. At each checkpoint, a gender subspace is constructed using SVMs trained on contextual embeddings of gendered words. Bias is evaluated by projecting profession terms from controlled sentences onto these subspaces, measuring accuracy. Note: Colors do not reflect the stereotype-related color scheme defined in section 4 [PITH… view at source ↗

**Figure 2.** Figure 2: Example Dutch sentence pairs used in our projection-based experiments. Each sentence follows the template “[TARGET] is een [ATTRIBUTE]” and varies by the gender of the subject (man/vrouw) and the stereotype associated with the profession. Sentences on the left are pro-stereotypical; those on the right are anti-stereotypical. For clarity, all examples use neutral (historically male) profession forms. Explic… view at source ↗

**Figure 3.** Figure 3: Six example Dutch sentences using the template ”[TARGET] is een [ATTRIBUTE]” used to test gender bias in our BERT model. The examples systematically vary target gender (male or female), the grammatical gender marking of the attribute (neutral vs. femalesuffix), and stereotype alignment (pro-stereotypical vs. anti-stereotypical). This controlled setup enables isolation of grammatical form, target gender, a… view at source ↗

**Figure 4.** Figure 4: SVM classification accuracy over BERT training epochs, using all embedding dimensions. At each BERT checkpoint, a new SVM is fully trained on contextual embeddings to classify gender. Accuracy increases steadily and stabilizes above 0.92. It is only slightly increasing after epoch 20, indicating that gender becomes linearly separable in BERT’s embedding space at that point. Note: Colors do not reflect the … view at source ↗

**Figure 5.** Figure 5: SVM classification accuracy over BERT training epochs using all embedding dimensions (blue), only the best-performing dimension (orange, dimension 211), and all dimensions except the best one (green). The high accuracy of the green line shows that removing the top dimension has minimal effect, indicating that gender information is distributed across multiple embedding dimensions. Note: Colors do not reflec… view at source ↗

**Figure 6.** Figure 6: SVM classification accuracy over BERT training epochs using the five best-performing individual embedding dimensions. While each dimension achieves moderate accuracy on its own, none dominates entirely, confirming that gender information is distributed across multiple dimensions rather than concentrated in a single “gender unit” [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Clustered average recall for the Female class across embedding dimensions at BERT epoch 50, with dimensions grouped into 30 K-Means clusters. High-recall clusters are sparse and localized, indicating that female gender information is concentrated in a limited number of dimensions [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Clustered average recall for the Male class across embedding dimensions at BERT epoch 50, with dimensions grouped into 30 K-Means clusters. Recall is higher and more evenly distributed, suggesting that male gender information is encoded more diffusely across the embedding space. 5.2 Bias in sentence contexts To answer questions (iii) (whether stereotypical gender-profession associations dominate the gender… view at source ↗

**Figure 9.** Figure 9: Accuracy difference between pro-stereotypical and anti-stereotypical gender–profession pairs. Accuracy is substantially higher for stereotypical sentences (82.55%) than for antistereotypical ones (43.71%), indicating a strong alignment with societal gender stereotypes. The dotted line marks chance level (50%). Prediction accuracy by stereotype alignment and gender [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Gender prediction accuracy split by target gender and stereotype alignment. Accuracy is higher for stereotypical cases in both genders, but the effect is more pronounced for male targets (93% vs. 40%) than for female targets (78% vs. 46%), indicating a stronger bias toward male-stereotypical associations [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Prediction accuracy across profession variants, split by target gender, stereotype alignment, and female-marked attributes. For female targets, female-marked attribute suffixes dramatically increase accuracy, especially in anti-stereotypical contexts (83% vs. 9%). and marked in the training corpus used by the model overall. It directly informs the answer to (iv) (how morphological gender marking affects … view at source ↗

**Figure 12.** Figure 12: Architectural overview of the Dutch BERT model. The model processes input text from bottom to top: the Tokenizer converts raw sentences into token sequences, which are then fed into a 12-layer BERT encoder comprising self-attention and feed-forward sublayers. The output embeddings are passed through the MLM (Masked Language Modeling) head, which predicts masked tokens during pretraining. Contextual embedd… view at source ↗

**Figure 13.** Figure 13: shows the training and validation loss across 100 epochs of masked language modeling (MLM). Both curves show a steep decline early on. The validation loss drops rapidly in the first 20 epochs, with a sudden drop between epoch 16 and 20. After this, loss slowly decreases. By the final epoch, the model achieves a validation loss of 2.23, indicating that it has converged reasonably well. This suggests the mo… view at source ↗

read the original abstract

Gender bias in large language models has primarily been investigated for English, while languages with grammatical or morphological gender remain comparatively understudied. This paper investigates how and when gender information emerges in a Dutch BERT model trained from scratch, offering one of the first checkpoint-level analyses of bias formation in a Transformer architecture for a language combining overt morphological gender marking and generic forms. By extracting contextual embeddings throughout training, we construct dynamic gender subspaces using linear SVMs to trace when gender becomes linearly encoded and how this encoding evolves over time. Contextual embeddings are often assumed to integrate contextual cues robustly, allowing models to adjust the representation of a word depending on its more local usage. We therefore test whether explicit gender cues in controlled sentence templates (e.g., Zij is een loodgieter ('She is a plumber')) can override learned statistical associations (plumber -> male). Our findings challenge this assumption: although gender becomes clearly linearly separable around epoch 20 and is distributed across multiple embedding dimensions, the model struggles to update its internal gender representation in light of explicit contextual cues in short sentence templates. Stereotypical gender-profession pairings are predicted far more accurately than anti-stereotypical ones, and generic forms in Dutch systematically default to a male interpretation, even when the context explicitly denotes a female referent. Together, our results seem to indicate that contextualization in the representations learned by our Dutch BERT model is not sufficiently dynamic along the probed gender direction: explicit gender cues in anti-stereotypical contexts are not reliably reflected in the resulting representations, resulting in persistent male-default behaviour.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes the emergence and contextual integration of gender bias in a Dutch BERT model trained from scratch. It extracts contextual embeddings at training checkpoints and uses linear SVMs to construct dynamic gender subspaces, tracking when gender information becomes linearly separable (around epoch 20). It then tests integration of explicit gender cues via short controlled sentence templates (e.g., 'Zij is een loodgieter' for anti-stereotypical professions), finding that stereotypical associations dominate predictions, generic masculine forms default to male interpretations, and explicit female cues fail to reliably shift representations away from male defaults.

Significance. If the results hold under more varied conditions, the work provides one of the first checkpoint-level views of bias formation in a Transformer for a language with overt morphological gender marking. The SVM-based probing of evolving subspaces is a clear methodological strength, offering traceable metrics for linear encoding of gender. The findings challenge assumptions about robust contextualization in embeddings and have implications for debiasing in multilingual models, though the scope is limited to Dutch and short templates.

major comments (2)

[§4] §4 (template-based evaluation) and abstract: the central claim that 'contextualization ... is not sufficiently dynamic' and that explicit cues 'are not reliably reflected' rests on short hand-crafted templates without any reported comparisons to longer natural sentences or varied syntactic frames. If richer contexts allow the model to override training-data statistics, the observed male-default behavior and anti-stereotypical prediction gaps could be template-specific rather than a general property of the learned representations.
[Results] Results section and abstract: the claim that 'stereotypical gender-profession pairings are predicted far more accurately than anti-stereotypical ones' lacks reported effect sizes, statistical significance tests, or baselines (e.g., majority-class predictor or non-contextual embeddings). Without these, the magnitude and robustness of the bias cannot be fully assessed, weakening verification of the persistent male-default conclusion.

minor comments (2)

[§3] The description of SVM probe construction (likely §3) would benefit from explicit reporting of hyperparameter choices and cross-validation details to ensure reproducibility of the gender subspace tracing.
[Figures] Figure captions and axis labels for the checkpoint-level separability plots should include error bars or confidence intervals to clarify variability across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing the strongest honest defense of our work while acknowledging limitations where they exist. Revisions have been made to strengthen the paper accordingly.

read point-by-point responses

Referee: [§4] §4 (template-based evaluation) and abstract: the central claim that 'contextualization ... is not sufficiently dynamic' and that explicit cues 'are not reliably reflected' rests on short hand-crafted templates without any reported comparisons to longer natural sentences or varied syntactic frames. If richer contexts allow the model to override training-data statistics, the observed male-default behavior and anti-stereotypical prediction gaps could be template-specific rather than a general property of the learned representations.

Authors: We agree that the evaluation is restricted to short, hand-crafted templates and that this constitutes a genuine limitation for generalizing the claims about contextualization to richer or longer natural sentences. Our design choice was intentional: short templates provide the minimal context in which explicit gender cues (e.g., 'Zij') should most easily override stereotypical associations if dynamic integration were robust. Failure to do so in these controlled settings is therefore informative about the model's limitations. Nevertheless, we have revised the Discussion section to explicitly acknowledge this scope restriction, to discuss the possibility that longer contexts might yield different behavior, and to outline future experiments with varied syntactic frames and natural corpora. No new experiments on longer sentences were feasible within the current study, but the added discussion addresses the referee's concern directly. revision: partial
Referee: [Results] Results section and abstract: the claim that 'stereotypical gender-profession pairings are predicted far more accurately than anti-stereotypical ones' lacks reported effect sizes, statistical significance tests, or baselines (e.g., majority-class predictor or non-contextual embeddings). Without these, the magnitude and robustness of the bias cannot be fully assessed, weakening verification of the persistent male-default conclusion.

Authors: We accept this criticism as valid. The original manuscript did not include effect sizes, formal significance tests, or explicit baselines for the accuracy gap between stereotypical and anti-stereotypical conditions. In the revised version we have added: (1) effect sizes (Cohen's h for proportion differences), (2) McNemar's tests for paired accuracy comparisons, and (3) two baselines—a majority-class predictor derived from profession gender frequencies in the training data and a non-contextual static embedding baseline. These additions are now reported in the Results section and confirm that the observed gaps are both statistically significant and substantially larger than the baselines, thereby strengthening the evidence for persistent male-default behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical probing of trained embeddings

full rationale

The paper conducts an empirical study by training a Dutch BERT model from scratch, extracting contextual embeddings at training checkpoints, constructing gender subspaces via linear SVMs, and evaluating them on controlled sentence templates. No derivation, equation, or prediction is presented that reduces the measured gender direction or contextualization behavior to a fitted parameter or self-citation defined by the target result itself. All claims rest on direct observations from the trained model and template tests, rendering the work self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard assumptions of linear probing and template-based testing without introducing new free parameters, axioms beyond domain norms, or invented entities.

axioms (1)

domain assumption Gender can be treated as a linearly separable direction in contextual embedding space that an SVM can reliably extract at different training stages.
Invoked when constructing dynamic gender subspaces from embeddings throughout training.

pith-pipeline@v0.9.0 · 5583 in / 1280 out tokens · 29866 ms · 2026-05-11T02:09:23.899350+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
we construct dynamic gender subspaces using linear SVMs to trace when gender becomes linearly encoded... project profession terms within controlled sentence templates... onto the learned gender subspace
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Stereotypical gender–profession pairings are predicted far more accurately than anti-stereotypical ones

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 4 internal anchors

[1]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin and Ming. CoRR , volume =. 2018 , url =. 1810.04805 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

doi: 10.18653/v1/N18-1202

Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke. Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10...

work page doi:10.18653/v1/n18-1202 2018
[3]

Elisa Celis

Yi Chern Tan and L. Elisa Celis , title =. CoRR , volume =. 2019 , url =. 1911.01485 , timestamp =

work page arXiv 2019
[4]

URL https://doi.org/10.1145/ 3442188.3445922

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2021 , isbn =. doi:10.1145/3442188.3445922 , abstract =

work page doi:10.1145/3442188.3445922 2021
[5]

Getting Gender Right in Neural Machine Translation

Vanmassenhove, Eva and Hardmeier, Christian and Way, Andy. Getting Gender Right in Neural Machine Translation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1334

work page doi:10.18653/v1/d18-1334 2018
[6]

Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem

Saunders, Danielle and Byrne, Bill. Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.690

work page doi:10.18653/v1/2020.acl-main.690 2020
[7]

Patrick Wilhelm, Thorsten Wittkopp, and Odej Kao

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference , author =. arXiv preprint arXiv:2412.13663 , year =

work page arXiv
[8]

arXiv preprint arXiv:2503.05500 , year =

EuroBERT: Scaling Multilingual Encoders for European Languages , author =. arXiv preprint arXiv:2503.05500 , year =

work page arXiv
[9]

Transactions on Machine Learning Research , year =

NeoBERT: A Next-Generation BERT , author =. Transactions on Machine Learning Research , year =

work page
[10]

The Risk of Racial Bias in Hate Speech Detection

Sap, Maarten and Card, Dallas and Gabriel, Saadia and Choi, Yejin and Smith, Noah A. The Risk of Racial Bias in Hate Speech Detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1163

work page doi:10.18653/v1/p19-1163 2019
[11]

F air P rism: Evaluating Fairness-Related Harms in Text Generation

Fleisig, Eve and Amstutz, Aubrie and Atalla, Chad and Blodgett, Su Lin and Daum \'e III, Hal and Olteanu, Alexandra and Sheng, Emily and Vann, Dan and Wallach, Hanna. F air P rism: Evaluating Fairness-Related Harms in Text Generation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi...

work page doi:10.18653/v1/2023.acl-long.343 2023
[12]

The woman worked as a babysitter: On biases in language generation

Sheng, Emily and Chang, Kai-Wei and Natarajan, Premkumar and Peng, Nanyun. The Woman Worked as a Babysitter: On Biases in Language Generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1339

work page doi:10.18653/v1/d19-1339 2019
[13]

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , journal =

Tolga Bolukbasi and Kai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , journal =. 2016 , url =. 1607.06520 , timestamp =

work page arXiv 2016
[14]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

Ravfogel, Shauli and Elazar, Yanai and Gonen, Hila and Twiton, Michael and Goldberg, Yoav. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.647

work page doi:10.18653/v1/2020.acl-main.647 2020
[15]

2022 , eprint=

The Birth of Bias: A case study on the evolution of gender bias in an English language model , author=. 2022 , eprint=

work page 2022
[16]

Bryson and Arvind Narayanan , title =

Aylin Caliskan Islam and Joanna J. Bryson and Arvind Narayanan , title =. CoRR , volume =. 2016 , url =. 1608.07187 , timestamp =

work page arXiv 2016
[17]

Science , volume=

Semantics derived automatically from language corpora contain human-like biases , author=. Science , volume=. 2017 , publisher=

work page 2017
[18]

and Rudinger, Rachel

May, Chandler and Wang, Alex and Bordia, Shikha and Bowman, Samuel R. and Rudinger, Rachel. On Measuring Social Biases in Sentence Encoders. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1063

work page doi:10.18653/v1/n19-1063 2019
[19]

Gender bias in coreference resolution: Evaluation and debiasing methods

Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. doi:10.18653/v1/N18-2003

work page doi:10.18653/v1/n18-2003 2018
[20]

Rudinger, J

Rudinger, Rachel and Naradowsky, Jason and Leonard, Brian and Van Durme, Benjamin. Gender Bias in Coreference Resolution. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. doi:10.18653/v1/N18-2002

work page doi:10.18653/v1/n18-2002 2018
[21]

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT `s Gender Bias

Bartl, Marion and Nissim, Malvina and Gatt, Albert. Unmasking Contextual Stereotypes: Measuring and Mitigating BERT `s Gender Bias. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing. 2020

work page 2020
[22]

Measuring Bias in Contextualized Word Representations

Kurita, Keita and Vyas, Nidhi and Pareek, Ayush and Black, Alan W and Tsvetkov, Yulia. Measuring Bias in Contextualized Word Representations. Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 2019. doi:10.18653/v1/W19-3823

work page doi:10.18653/v1/w19-3823 2019
[23]

C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.154

work page doi:10.18653/v1/2020.emnlp-main.154 2020
[24]

S tereo S et: Measuring stereotypical bias in pretrained language models

Nadeem, Moin and Bethke, Anna and Reddy, Siva. S tereo S et: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.416

work page doi:10.18653/v1/2021.acl-long.416 2021
[25]

and Kirchhoff, Katrin

Salazar, Julian and Liang, Davis and Nguyen, Toan Q. and Kirchhoff, Katrin. Masked Language Model Scoring. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.240

work page doi:10.18653/v1/2020.acl-main.240 2020
[26]

How Does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?

Gonen, Hila and Kementchedjhieva, Yova and Goldberg, Yoav. How Does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019. doi:10.18653/v1/K19-1043

work page doi:10.18653/v1/k19-1043 2019
[27]

Evaluating Bias In D utch Word Embeddings

Ch \'a vez Mulsa, Rodrigo Alejandro and Spanakis, Gerasimos. Evaluating Bias In D utch Word Embeddings. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing. 2020

work page 2020
[28]

Examining Gender Bias in Languages with Grammatical Gender

Zhou, Pei and Shi, Weijia and Zhao, Jieyu and Huang, Kuan-Hao and Chen, Muhao and Cotterell, Ryan and Chang, Kai-Wei. Examining Gender Bias in Languages with Grammatical Gender. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 20...

work page doi:10.18653/v1/d19-1531 2019
[29]

F rench C row S -pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than E nglish

N. F rench C row S -Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than E nglish. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.583

work page doi:10.18653/v1/2022.acl-long.583 2022
[30]

Debiasing Pre-trained Contextualised Embeddings

Kaneko, Masahiro and Bollegala, Danushka. Debiasing Pre-trained Contextualised Embeddings. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.107

work page doi:10.18653/v1/2021.eacl-main.107 2021
[31]

SoNaR-corpus (Version 1.2.1) , year =

work page
[32]

Multilingual Nonce Dependency Treebanks: Understanding how Language Models Represent and Process Syntactic Structure

Arps, David and Kallmeyer, Laura and Samih, Younes and Sajjad, Hassan. Multilingual Nonce Dependency Treebanks: Understanding how Language Models Represent and Process Syntactic Structure. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). ...

work page doi:10.18653/v1/2024.naacl-long.433 2024
[33]

Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation

Manna, Chiara and Alishahi, Afra and Blain, Fr \'e d \'e ric and Vanmassenhove, Eva. Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation. Proceedings of the 3rd Workshop on Gender-Inclusive Translation Technologies (GITT 2025). 2025

work page 2025
[34]

A Structural Probe for Finding Syntax in Word Representations

Hewitt, John and Manning, Christopher D. A Structural Probe for Finding Syntax in Word Representations. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1419

work page doi:10.18653/v1/n19-1419 2019
[35]

2022 , eprint=

A Survey on Bias and Fairness in Natural Language Processing , author=. 2022 , eprint=

work page 2022
[36]

2009 , publisher =

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit , author =. 2009 , publisher =

work page 2009
[37]

Scikit-learn: machine learning in Python

Fabian Pedregosa and Ga. Scikit-learn: Machine Learning in Python , journal =. 2012 , url =. 1201.0490 , timestamp =

work page arXiv 2012
[38]

Attention Is All You Need

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. CoRR , volume =. 2017 , url =. 1706.03762 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Cortes, Corinna and Vapnik, Vladimir , title =. Mach. Learn. , month = sep, pages =. 1995 , issue_date =. doi:10.1023/A:1022627411411 , abstract =

work page doi:10.1023/a:1022627411411 1995
[40]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...

work page
[41]

R., Millman, K

Charles R. Harris and K. Jarrod Millman and St. Array programming with. 2020 , month = sep, journal =. doi:10.1038/s41586-020-2649-2 , publisher =

work page doi:10.1038/s41586-020-2649-2 2020
[42]

Hunter, J. D. , Title =. Computing in Science & Engineering , Volume =

work page
[43]

2024 , note =

Microsoft Azure Blob Storage , author =. 2024 , note =

work page 2024
[44]

Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment

Jumelet, Jaap and Zuidema, Willem and Hupkes, Dieuwke. Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019. doi:10.18653/v1/K19-1001

work page doi:10.18653/v1/k19-1001 2019
[45]

How to pretrain a BERT model from scratch , year = 2021, url =

work page 2021
[46]

LLM Course - Chapter 7.6: Training from Scratch , year = 2023, url =

work page 2023
[47]

Arnab Bhattacharya , title =

work page
[48]

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme , pages=

The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch , author=. Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme , pages=. 2013 , publisher=

work page 2013
[49]

Nog eens functie-en rolbenamingen in het Nederlands vanuit contastief perspectief , author=

Zij is een powerfeministe. Nog eens functie-en rolbenamingen in het Nederlands vanuit contastief perspectief , author=. Tijdschrift voor genderstudies , volume=

work page
[50]

Gender Across Languages: The linguistic representation of women and men , volume=

Towards a more gender-fair usage in Netherlands Dutch , author=. Gender Across Languages: The linguistic representation of women and men , volume=. 2002 , publisher=

work page 2002
[51]

De Gids , volume=

Over taal en seks, seksisme en emancipatie , author=. De Gids , volume=

work page
[52]

2023 , type =

Boudewijn, Julia , title =. 2023 , type =

work page 2023
[53]

Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

Losing our Tail--Again: On (Un) Natural Selection And Multilingual Large Language Models , author=. arXiv preprint arXiv:2507.03933 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Proceedings of Machine Translation Summit XVII: Research Track , pages=

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation , author=. Proceedings of Machine Translation Summit XVII: Research Track , pages=

work page
[55]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages=

Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation , author=. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume , pages=

work page
[56]

Thomas McCoy and Shunyu Yao and Dan Friedman and Matthew Hardy and Thomas L

Embers of autoregression: Understanding large language models through the problem they are trained to solve , author=. arXiv preprint arXiv:2309.13638 , year=

work page arXiv
[57]

Patterns , volume=

A decade of gender bias in machine translation , author=. Patterns , volume=. 2025 , publisher=

work page 2025
[58]

Journal of Vocational Behavior , volume=

Changing (S) expectations: How gender fair job descriptions impact children's perceptions and interest regarding traditionally male occupations , author=. Journal of Vocational Behavior , volume=. 2013 , publisher=

work page 2013
[59]

Social Psychology , year=

Yes i can!Effects of gender fair job descriptions on children’s perceptions of job status, job difficulty, and vocational self-efficacy , author=. Social Psychology , year=

work page
[60]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[61]

Gender-Inclusive Language

Gender-inclusive language in Dutch , author=. Gender-Inclusive Language. Findings from 14 Languages and Open Research Questions , pages=. 2026 , publisher=

work page 2026
[62]

2021 , publisher=

Hoe automatische vertaling de genderbias van AI (Artificial Intelligence) verraadt , author=. 2021 , publisher=

work page 2021
[63]

and Raimundo Schulz, Emma and Saci, Thiziri and Saidi, Sarah and Torroba Marchante, Javier and Xie, Shilin and Zanotto, Sergio E

Fort, Karen and Alonso Alemany, Laura and Benotti, Luciana and Bezan c on, Julien and Borg, Claudia and Borg, Marthese and Chen, Yongjian and Ducel, Fanny and Dupont, Yoann and Ivetta, Guido and Li, Zhijian and Mieskes, Margot and Naguib, Marco and Qian, Yuyan and Radaelli, Matteo and Schmeisser-Nieto, Wolfgang S. and Raimundo Schulz, Emma and Saci, Thizi...

work page 2024
[64]

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

Liang, Sheng and Dufter, Philipp and Sch. Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.446

work page doi:10.18653/v1/2020.coling-main.446 2020
[65]

doi: 10.18653/v1/P19-1356

Jawahar, Ganesh and Sagot, Beno \^i t and Seddah, Djam \'e. What Does BERT Learn about the Structure of Language?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1356

work page doi:10.18653/v1/p19-1356 2019
[66]

BERT Rediscovers the Classical NLP Pipeline

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452

work page doi:10.18653/v1/p19-1452 2019