Recognition: 2 theorem links
· Lean TheoremDisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models
Pith reviewed 2026-05-14 20:05 UTC · model grok-4.3
The pith
General-purpose safety benchmarks for language models miss disability harms because those harms are personal, intersectional, and community-defined.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DisaBench supplies a taxonomy of twelve disability harm categories developed with people who have lived experience, a method that pairs benign and adversarial prompts in seven life domains, and a set of 175 prompts whose 525 responses were labeled by four evaluators with disabilities. The annotations establish that harm rates vary by disability type and intensify outside text, that terminology harms are culturally and temporally specific, and that standard safety checks catch only overt problems while missing subtler ones visible only to domain experts. Disability harm therefore cannot be separated from a person's full identity and community, so general-purpose benchmarks miss it by design.
What carries the argument
The taxonomy of twelve disability harm categories co-created with evaluators who have lived experience, which structures the prompt pairs and human annotations to surface context-dependent harms.
If this is right
- Safety pipelines must add participatory review steps to catch harms that standard red-teaming overlooks.
- Evaluation datasets need separate tracks for each disability type rather than a single aggregate score.
- Terminology checks in benchmarks require regular updates to match shifting cultural standards.
- Non-text model outputs will require new testing layers because harms compound there.
- The framework can slot directly into current safety tools without extra infrastructure.
Where Pith is reading between the lines
- Model developers will need to treat community involvement as a recurring requirement rather than a one-time audit.
- The same participatory structure could be adapted to measure harms experienced by other groups whose identities are not captured in broad benchmarks.
- Training data filtering rules may need revision once subtle harms become measurable.
- Real-world deployment of these models should include ongoing feedback loops with affected communities.
Load-bearing premise
The taxonomy and 175 prompts developed with four evaluators who have lived experience are enough to represent the full range of disability harms across cultures, time, and non-text forms.
What would settle it
A side-by-side test in which the same 175 prompts and four lived-experience annotators are run through an existing general safety benchmark and produce the same detection rates for the subtle, context-specific harms that DisaBench identifies.
Figures
read the original abstract
General-purpose safety benchmarks for large language models do not adequately evaluate disability-related harms. We introduce DisaBench: a taxonomy of twelve disability harm categories co-created with people with disabilities and red teaming experts, a taxonomy-driven evaluation methodology that pairs benign and adversarial prompts across seven life domains, and a dataset of 175 prompts with human-annotated labels on 525 prompt-response pairs. Annotation by four evaluators with lived disability experience reveals three findings: harm rates vary sharply by disability type and will compound in non-text modalities, terminology-driven harm is culturally and temporally bound rather than universally assessable, and standard safety evaluation catches overt failures while missing the subtle harms that only domain expertise can recognize. Disability harm is simultaneously personal, intersectional, and community-defined: it cannot be isolated from the full context of who a person is, and general-purpose benchmarks systematically miss it. We will release the dataset, taxonomy, and methodology via Hugging Face and an open-source red teaming framework for direct integration into existing safety pipelines with no additional infrastructure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DisaBench, a participatory framework for evaluating disability-related harms in LLMs. It presents a 12-category taxonomy co-created with people with disabilities and red-teaming experts, a methodology pairing benign and adversarial prompts across seven life domains, and a dataset of 175 prompts yielding 525 human-annotated prompt-response pairs. Annotations by four evaluators with lived experience support three findings: harm rates vary by disability type and compound in non-text modalities; terminology-driven harm is culturally and temporally bound; and standard safety evaluations miss subtle harms recognizable only with domain expertise. The central claim is that disability harm is personal, intersectional, and community-defined, so general-purpose benchmarks systematically fail to capture it. The artifacts are slated for open release.
Significance. If the participatory methodology and reported patterns hold under broader validation, the work supplies concrete, integrable artifacts (taxonomy, prompts, dataset) that could improve detection of nuanced disability harms currently overlooked by existing safety pipelines. The emphasis on lived-experience annotation and the open red-teaming framework constitute a practical contribution to the field.
major comments (2)
- [§3] §3 (Methodology) and abstract: The three findings rest on annotations from only four evaluators with lived experience. No inter-annotator agreement statistics, prompt sampling protocol, or quantitative overlap metrics with existing safety classifiers on the 525 pairs are reported, leaving the claims that harms 'vary sharply by type' and that standard benchmarks 'systematically miss' subtle cases without sufficient reliability or comparative evidence.
- [§4] §4 (Findings) and §5 (Discussion): The assertion that disability harm is 'community-defined' and cannot be isolated from personal/intersectional context is supported solely by the n=4 participatory input and 175-prompt set. This sample size is too small to underwrite the general claim that general-purpose benchmarks miss such harms across cultures and modalities; external validation against larger disability communities or direct quantitative comparison is required.
minor comments (2)
- The abstract states the dataset will be released via Hugging Face but provides no licensing, versioning, or exact schema details; these should be added for reproducibility.
- [§3] Notation for the seven life domains and the pairing of benign/adversarial prompts could be clarified with an explicit table or diagram in §3.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our participatory evaluation framework. We address the major comments point by point below, providing clarifications on our methodology while committing to targeted revisions that strengthen transparency without altering the core participatory approach.
read point-by-point responses
-
Referee: [§3] §3 (Methodology) and abstract: The three findings rest on annotations from only four evaluators with lived experience. No inter-annotator agreement statistics, prompt sampling protocol, or quantitative overlap metrics with existing safety classifiers on the 525 pairs are reported, leaving the claims that harms 'vary sharply by type' and that standard benchmarks 'systematically miss' subtle cases without sufficient reliability or comparative evidence.
Authors: We agree that inter-annotator agreement statistics should have been reported for full transparency. In the revised manuscript we will add Fleiss' kappa (or equivalent) computed across the four evaluators' annotations on the 525 pairs. The prompt sampling protocol was taxonomy-driven and systematically spanned the 12 harm categories across the seven life domains through iterative co-creation with the participatory group; we will expand §3 to describe this process explicitly, including how prompts were paired as benign/adversarial. We will also add a quantitative overlap analysis comparing our annotations against outputs from standard safety classifiers (e.g., Perspective API, OpenAI moderation) on the same 525 pairs. The choice of four evaluators with lived experience follows established participatory design principles in disability studies, where depth of expertise is prioritized; we will articulate this rationale more clearly while acknowledging the trade-off in scale. revision: partial
-
Referee: [§4] §4 (Findings) and §5 (Discussion): The assertion that disability harm is 'community-defined' and cannot be isolated from personal/intersectional context is supported solely by the n=4 participatory input and 175-prompt set. This sample size is too small to underwrite the general claim that general-purpose benchmarks miss such harms across cultures and modalities; external validation against larger disability communities or direct quantitative comparison is required.
Authors: We accept that the n=4 participatory input and 175-prompt set limit broad generalizability, and we do not claim the results constitute exhaustive proof across all cultures or modalities. The central argument is that disability harm is inherently personal and community-defined, which our participatory process was designed to surface; the findings illustrate specific cases where standard benchmarks fail to detect subtle harms that domain experts recognize. We will revise §5 to more explicitly frame the work as an initial demonstration and to include a stronger call for external validation by larger disability communities. The artifacts (taxonomy, prompts, dataset) are released precisely to enable such follow-on studies. Direct quantitative comparison with existing classifiers will be added as noted in the response to §3. revision: partial
Circularity Check
No significant circularity; contribution is new artifact creation and empirical annotation
full rationale
The paper introduces a new taxonomy of 12 categories, 175 prompts, and 525 annotated pairs created via participatory co-design with four evaluators having lived disability experience. The three reported findings (varying harm rates, cultural bounding of terminology harm, and missed subtle cases) are direct observations from annotating this newly constructed dataset rather than any mathematical derivation, fitted parameter, or self-referential reduction. No equations, predictive models, uniqueness theorems, or self-citations appear in the provided text that would make claims equivalent to inputs by construction. The central assertion that general-purpose benchmarks miss harms follows from the new evaluation framework itself, which is self-contained as an independent artifact.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Participatory co-creation with people who have disabilities produces more valid harm categories than expert-only design.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
taxonomy of twelve disability harm categories co-created with people with disabilities... dataset of 175 prompts with human-annotated labels on 525 prompt-response pairs
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
harm rates vary sharply by disability type... standard safety evaluation catches overt failures while missing the subtle harms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
2001 , address =
International Classification of Functioning, Disability and Health (. 2001 , address =
2001
-
[2]
2011 , address =
World Report on Disability , author =. 2011 , address =
2011
-
[3]
2024 , howpublished =
Disability and Health , author =. 2024 , howpublished =
2024
-
[4]
Applied Sciences , volume =
Identification of Challenges and Best Practices for Including Users with Disabilities in User-Based Testing , author =. Applied Sciences , volume =. 2023 , doi =
2023
-
[5]
American Journal of Orthopsychiatry , volume =
Shifting the Discourse on Disability: Moving to an Inclusive, Intersectional Focus , author =. American Journal of Orthopsychiatry , volume =. 2023 , doi =
2023
-
[6]
American Psychologist , volume =
Person-First and Identity-First Language: Developing Psychologists' Cultural Competence Using Disability Language , author =. American Psychologist , volume =. 2015 , doi =
2015
-
[7]
and Mortenson, W
Best, Krista L. and Mortenson, W. Ben and Lauzière-Fitzgerald, Zoé and Smith, Emma M. , journal =. Language Matters!. 2022 , doi =
2022
-
[8]
, booktitle =
Sharif, Aashaka and McCall, Aedan Liam and Bolante, Kevin R. , booktitle =. Should. 2022 , doi =
2022
-
[9]
Frontiers in Public Health , volume =
The Study of Ableism in Population Health: A Critical Review , author =. Frontiers in Public Health , volume =. 2024 , doi =
2024
-
[10]
and Quintanilha, Daniel de Freitas , journal =
da Silva, Lucas Teles and Abramov, Dimitri M. and Quintanilha, Daniel de Freitas , journal =. Are We Truly Fighting Ableism?. 2025 , doi =
2025
-
[11]
2021 , publisher =
Demystifying Disability: What to Know, What to Say, and How to Be an Ally , author =. 2021 , publisher =
2021
-
[12]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone , author =. arXiv preprint arXiv:2404.14219 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
2024 , url =
Phi-4 Technical Report , author =. 2024 , url =
2024
-
[14]
2025 , howpublished =
Grok-3 Model , author =. 2025 , howpublished =
2025
-
[15]
2025 , month =
Grok 4 Model Card , author =. 2025 , month =
2025
-
[16]
2025 , month =
Grok 4.1 Model Card , author =. 2025 , month =
2025
-
[17]
2023 , month =
Open Release of Grok-1 , author =. 2023 , month =
2023
-
[18]
Lopez Munoz, Gary D. and Minnich, Amanda J. and Lutz, Roman and Lundeen, Richard and Dheekonda, Raja Sekhar Rao and Chikanov, Nina and Jagdagdorj, Bolor-Erdene and Pouliot, Martin and Chawla, Shiven and Maxwell, Whitney and Bullwinkel, Blake and Pratt, Katherine and de Gruyter, Joris and Siska, Charlotte and Bryan, Pete and Westerhoff, Tori and Kawaguchi,...
-
[19]
Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES '23) , pages =
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction , author =. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES '23) , pages =. 2023 , publisher =
2023
-
[20]
2020 , publisher =
Design Justice: Community-Led Practices to Build the Worlds We Need , author =. 2020 , publisher =
2020
-
[21]
2024 , note =
Mazeika, Mantas and Phan, Long and Yin, Xuwang and Zou, Andy and Wang, Zifan and Mu, Norman and Sakhaee, Elham and Li, Nathaniel and Basart, Steven and Li, Bo and Forsyth, David and Hendrycks, Dan , journal =. 2024 , note =
2024
-
[22]
Social Biases in
Hutchinson, Ben and Prabhakaran, Vinodkumar and Denton, Emily and Webster, Kellie and Zhong, Yu and Denuyl, Stephen , booktitle =. Social Biases in. 2020 , publisher =
2020
-
[23]
Proceedings of the 29th International Conference on Computational Linguistics , pages =
A Study of Implicit Bias in Pretrained Language Models against People with Disabilities , author =. Proceedings of the 29th International Conference on Computational Linguistics , pages =. 2022 , publisher =
2022
-
[24]
arXiv preprint arXiv:2307.09209 , year =
Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models , author =. arXiv preprint arXiv:2307.09209 , year =
-
[25]
Gadiraju, Vinitha and Kane, Shaun and Dev, Sunipa and Taylor, Alex and Wang, Ding and Denton, Emily and Brewer, Robin , booktitle =. ``. 2023 , publisher =
2023
-
[26]
2022 , publisher =
Parrish, Alicia and Chen, Angelica and Nangia, Nikita and Padmakumar, Vishakh and Phang, Jason and Thompson, Jana and Htut, Phu Mon and Bowman, Samuel , booktitle =. 2022 , publisher =
2022
-
[27]
2025 , publisher =
Panda, Srikant and Agarwal, Amit and Patel, Hitesh Laxmichand , booktitle =. 2025 , publisher =
2025
-
[28]
Phutane, Mahika and Seelam, Ananya and Vashistha, Aditya , booktitle =. ``. 2024 , publisher =. doi:10.1145/3630106.3659038 , note =
-
[29]
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , author =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2021 , publisher =
2021
-
[30]
2021 , url =
On the Opportunities and Risks of Foundation Models , author =. 2021 , url =
2021
-
[31]
Documenting Large Webtext Corpora: A Case Study on the
Dodge, Jesse and Sap, Maarten and Marasovi. Documenting Large Webtext Corpora: A Case Study on the. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages =. 2021 , publisher =
2021
-
[32]
and Tram
Chao, Patrick and Debenedetti, Edoardo and Robey, Alexander and Andriushchenko, Maksym and Croce, Francesco and Sehwag, Vikash and Dobriban, Edgar and Flammarion, Nicolas and Pappas, George J. and Tram. NeurIPS Datasets and Benchmarks Track , year =
-
[33]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Universal and Transferable Adversarial Attacks on Aligned Language Models , author =. arXiv preprint arXiv:2307.15043 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , pages =
R. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , pages =. 2024 , doi =
2024
-
[35]
and Hendren, Sara and Kaziunas, Liz and Mills, Mara and Morris, Meredith Ringel and Rankin, Joy and Rogers, Emily and Salas, Marcel and others , institution =
Whittaker, Meredith and Alper, Meryl and Bennett, Cynthia L. and Hendren, Sara and Kaziunas, Liz and Mills, Mara and Morris, Meredith Ringel and Rankin, Joy and Rogers, Emily and Salas, Marcel and others , institution =. Disability, Bias, and. 2019 , url =
2019
-
[36]
2021 , publisher =
Dhamala, Jwala and Sun, Tony and Kumar, Varun and Krishna, Satyapriya and Pruksachatkun, Yada and Chang, Kai-Wei and Gupta, Rahul , booktitle =. 2021 , publisher =
2021
-
[37]
, booktitle =
Gehman, Samuel and Gururangan, Suchin and Sap, Maarten and Choi, Yejin and Smith, Noah A. , booktitle =. 2020 , doi =
2020
-
[38]
Lessons From Red Teaming 100 Generative
Bullwinkel, Blake and Minnich, Amanda and Chawla, Shiven and Lopez, Gary and Pouliot, Martin and Maxwell, Whitney and de Gruyter, Joris and Pratt, Katherine and Qi, Saphir and Chikanov, Nina and Lutz, Roman and Dheekonda, Raja Sekhar Rao and Jagdagdorj, Bolor-Erdene and Kim, Eugenia and Song, Justin and Hines, Keegan and Jones, Daniel and Severi, Giorgio ...
-
[39]
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection , author=. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=. 2022 , publisher=
2022
-
[40]
Biometrics , volume =
The Measurement of Observer Agreement for Categorical Data , author =. Biometrics , volume =. 1977 , publisher =
1977
-
[41]
Computational Linguistics , volume =
Inter-Coder Agreement for Computational Linguistics , author =. Computational Linguistics , volume =. 2008 , publisher =
2008
-
[42]
Red-Teaming for Generative
Feffer, Michael and Sinha, Anusha and Deng, Wesley Hanwen and Lipton, Zachary Chase and Heidari, Hoda , booktitle =. Red-Teaming for Generative. 2024 , publisher =
2024
-
[43]
Communications of the ACM , volume =
Datasheets for Datasets , author =. Communications of the ACM , volume =. 2021 , publisher =
2021
-
[44]
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
A Holistic Approach to Undesired Content Detection in the Real World , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =
2023
-
[45]
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection , author =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =. 2021 , doi =
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.