pith. sign in

arxiv: 2605.26769 · v1 · pith:WZ6VMICLnew · submitted 2026-05-26 · 💻 cs.CY · cs.AI

Generative artificial intelligence and the marginalization of minoritized knowledges in higher education: the case of disability

Pith reviewed 2026-07-01 16:13 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords generative artificial intelligencehigher educationdisability studiesepistemic colonialitymarginalizationWestern-centric datasetsepistemic plurality
0
0 comments X

The pith

Generative AI systems marginalize non-hegemonic knowledges in higher education by relying on Anglophone Western-centric training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that generative artificial intelligence is not a neutral technology but actively contributes to the marginalization of non-hegemonic epistemologies as it restructures knowledge production and validation in higher education. It identifies the predominance of Anglophone and Western-centric training datasets as the driver that reinforces epistemic coloniality. The case of persons with disabilities serves as the central illustration, showing how these systems create double marginalization by confining users to reductive stereotypes or excluding them from design processes. The paper explores whether hybridization between researchers and machines could preserve epistemic plurality, while noting that algorithmic correction remains limited as a purely palliative approach.

Core claim

Generative artificial intelligence systems actively contribute to the marginalization of non-hegemonic epistemologies. Training datasets, which remain predominantly Anglophone and Western-centric, reinforce epistemic coloniality. The situation of persons with disabilities provides a particularly clear illustration of this phenomenon through technological architectures that confine these individuals to reductive stereotypes or exclude them from the design process, leading to a double marginalization. The paper examines whether a hybridization between the researcher and the machine might preserve epistemic plurality while acknowledging the structural limitations inherent in algorithmic correct

What carries the argument

The predominance of Anglophone and Western-centric training datasets that reinforce epistemic coloniality, illustrated through the double marginalization of persons with disabilities.

If this is right

  • Generative AI would restructure the processes of scientific knowledge production and validation in higher education along biased lines.
  • Persons with disabilities would encounter double marginalization through both design exclusion and reductive outputs.
  • Hybridization of human researchers with AI systems could serve as one route to maintain epistemic plurality.
  • Algorithmic correction strategies would remain insufficient to address the underlying structural limitations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar patterns of marginalization could apply to other minoritized groups beyond disability if the dataset bias holds.
  • Efforts to diversify training data sources might reduce epistemic coloniality in educational AI applications.
  • The hybridization idea could be tested through case studies of collaborative human-AI research practices in disability studies.

Load-bearing premise

That training datasets are predominantly Anglophone and Western-centric in a manner that directly causes epistemic coloniality and double marginalization for persons with disabilities.

What would settle it

An empirical audit of the language and cultural composition of major generative AI training datasets used in higher education contexts, paired with evidence on whether outputs systematically stereotype or exclude disabled knowledges.

read the original abstract

Generative artificial intelligence redefines higher education by restructuring the processes through which scientific knowledge is produced and validated. These systems are not neutral; they actively contribute to the marginalization of non-hegemonic epistemologies. This research draws upon educational sciences, critical technology studies, and disability studies to demonstrate that training datasets, which remain predominantly Anglophone and Western-centric, reinforce epistemic coloniality. The situation of persons with disabilities provides a particularly clear illustration of this phenomenon. Technological architectures frequently confine these individuals to reductive stereotypes or exclude them from the design process, leading to a double marginalization. This article examines whether a hybridization between the researcher and the machine might preserve epistemic plurality, while acknowledging the structural limitations inherent in algorithmic correction when used as a purely palliative strategy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that generative AI redefines higher education by restructuring knowledge production and validation processes in non-neutral ways that actively marginalize non-hegemonic epistemologies. Drawing on educational sciences, critical technology studies, and disability studies, it asserts that training datasets remain predominantly Anglophone and Western-centric and thereby reinforce epistemic coloniality; the situation of persons with disabilities is presented as a clear case of double marginalization through reductive stereotypes or exclusion from design. The paper examines whether researcher-machine hybridization might preserve epistemic plurality while acknowledging structural limits of algorithmic correction as a palliative measure.

Significance. If the links between dataset composition, epistemic coloniality, and specific marginalization outcomes were demonstrated, the work would add to interdisciplinary conversations on AI ethics, epistemic justice, and inclusive design in educational technology. It surfaces timely questions about whose knowledges are encoded in GenAI systems used in higher education. As currently presented, however, the contribution remains primarily conceptual and does not supply new empirical mappings, dataset audits, or falsifiable predictions that would strengthen its standing in the literature.

major comments (2)
  1. [Abstract] Abstract: the assertion that 'training datasets, which remain predominantly Anglophone and Western-centric, reinforce epistemic coloniality' is presented as a premise without any accompanying dataset audit, language-distribution statistics, citations to empirical provenance studies, or analysis of the specific corpora underlying current GenAI models. This premise is load-bearing for the central claim of active marginalization of minoritized knowledges.
  2. [Discussion of disability / double marginalization] The section discussing disability: the claim of 'double marginalization' via 'reductive stereotypes or exclusion from the design process' is stated without concrete examples of GenAI outputs in higher-education settings, references to particular model behaviors, or evidence tracing the causal pathway from dataset skew to epistemic exclusion of disabled knowledges. No case-level illustration or cited empirical work is supplied to ground the link.
minor comments (1)
  1. [Abstract] The abstract opens by stating 'This research draws upon...' yet the manuscript functions as a conceptual position piece; an explicit statement of scope (theoretical argument versus empirical study) would clarify expectations for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight opportunities to strengthen the manuscript's grounding. Our response addresses each major comment directly. The paper is primarily conceptual and draws on interdisciplinary literature; we will incorporate additional citations to empirical studies on dataset biases and disability-related AI exclusions to better support the premises, while preserving the theoretical focus. We cannot undertake new empirical audits or original data collection.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'training datasets, which remain predominantly Anglophone and Western-centric, reinforce epistemic coloniality' is presented as a premise without any accompanying dataset audit, language-distribution statistics, citations to empirical provenance studies, or analysis of the specific corpora underlying current GenAI models. This premise is load-bearing for the central claim of active marginalization of minoritized knowledges.

    Authors: We accept that the abstract would be strengthened by explicit citations. In revision we will add references to established empirical studies on the linguistic and cultural composition of major training corpora (e.g., analyses of Common Crawl, The Pile, and model-specific documentation from OpenAI, Google, and Meta). These citations will make the premise traceable to existing provenance research. A new dataset audit or original language-distribution statistics remain outside the scope of this conceptual paper; the revision is therefore partial. revision: partial

  2. Referee: [Discussion of disability / double marginalization] The section discussing disability: the claim of 'double marginalization' via 'reductive stereotypes or exclusion from the design process' is stated without concrete examples of GenAI outputs in higher-education settings, references to particular model behaviors, or evidence tracing the causal pathway from dataset skew to epistemic exclusion of disabled knowledges. No case-level illustration or cited empirical work is supplied to ground the link.

    Authors: We will revise the disability section to include concrete illustrations drawn from published empirical work, such as documented cases of stereotypical image generation for disabled students, accessibility failures in AI tutoring systems, and studies on the under-representation of disability in training data and design teams. Citations to relevant disability-studies and HCI literature will be added to trace the pathway from dataset skew to epistemic exclusion. As the manuscript is conceptual rather than empirical, we synthesize existing evidence rather than present new case studies or original causal analyses; the revision is therefore partial. revision: partial

Circularity Check

0 steps flagged

No circularity; interpretive argument draws on external fields without self-referential reductions.

full rationale

The paper advances a conceptual claim about generative AI and epistemic marginalization by referencing external literatures in educational sciences, critical technology studies, and disability studies. No equations, parameter fitting, predictions, or derivations appear. The assertions regarding dataset composition and double marginalization for disabled persons are presented as premises supported by cited fields rather than reducing to the paper's own inputs by definition or self-citation chain. This matches the default non-circular structure for a humanities/social-science argument.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions from critical theory without new empirical grounding or independent evidence supplied in the abstract.

axioms (2)
  • domain assumption Generative AI systems are not neutral and actively contribute to marginalization of non-hegemonic epistemologies.
    Invoked as the starting point for the analysis in the abstract.
  • domain assumption Training datasets are predominantly Anglophone and Western-centric.
    Stated without quantification or citation in the abstract.

pith-pipeline@v0.9.1-grok · 5667 in / 1371 out tokens · 42709 ms · 2026-07-01T16:13:51.161785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Adams, R. (2021). Can artificial intelligence be decolonized? Interdisciplinary Science Reviews, 46(1-2), 176-197. https://doi.org/10.1080/03080188.2020.1840225 30 Ahmad, W., Raj, R., & Shokeen, R. (2025). Reshaping special education: Strategic use of artificial intelligence. In Transforming special education through artificial intelligence (pp. 1- 44). I...

  2. [2]

    C., & Star, S

    Bowker, G. C., & Star, S. L. (2000). Sorting things out: Classification and its consequences. MIT Press. (Original work published

  3. [3]

    P., Mikola, D., Barbarioli, B., Alexander, R., Deer, L., Stafford, T., Vilhuber, L., & Bensch, G

    Brodeur, A., V alenta, D., Marcoci, A., Aparicio, J. P., Mikola, D., Barbarioli, B., Alexander, R., Deer, L., Stafford, T., Vilhuber, L., & Bensch, G. (2025). Comparing human-only, AI- assisted, and AI-led teams on assessing research reproducibility in quantitative social science (I4R Discussion Paper Series No. 195). Institute for Replication. Class, B.,...

  4. [4]

    https://doi.org/10.1186/s40359-024-01975-4 Floridi, L. (2013). The philosophy of information. Oxford University Press. Floridi, L. (2019). The logic of information: A theory of philosophy as conceptual design. Oxford University Press. Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., & Koyejo, S. (2024). Is model collapse in...

  5. [5]

    Gottweis, J., Weng, W.-H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., Myaskovsky, A., Weissenberger, F., Rong, K., Tanno, R., Saab, K., Popovici, D., Blum, J., Zhang, F., Chou, K., Hassidim, A., Gokturk, B., Vahdat, A., Kohli, P., & Natarajan, V . (2025). Towards an AI co- scientist. arXiv preprint arXiv:2502.18864. Grinbaum, A., Chatila, R., Devillers...

  6. [6]

    https://doi.org/10.1186/s41073-023-00133-5 Khalifa, M., & Albadawy, M. (2024). Using artificial intelligence in academic writing and research: An essential productivity tool. Computer Methods and Programs in Biomedicine Update, 5, 100145. https://doi.org/10.1016/j.cmpbup.2024.100145 Kobak, D., González-Márquez, R., Horvát, E.-Á., & Lause, J. (2025). Delvi...

  7. [7]

    Y ., Tlili, A., Lampropoulos, G., Huang, R., Jandrić, P., Zhao, J., Salha, S., Xu, L., Panda, S., Kinshuk, López-Pernas, S., & Saqr, M

    Mustafa, M. Y ., Tlili, A., Lampropoulos, G., Huang, R., Jandrić, P., Zhao, J., Salha, S., Xu, L., Panda, S., Kinshuk, López-Pernas, S., & Saqr, M. (2024). A systematic review of literature reviews on artificial intelligence in education (AIED): A roadmap to a future research agenda. Smart Learning Environments, 11(1). https://doi.org/10.1186/s40561-024-0...

  8. [8]

    Oudshoorn, N., Rommes, E., & Stienstra, M. (2004). Configuring the user as everybody: Gender and design cultures in information and communication technologies. Science, Technology, & Human Values, 29(1), 30-63. https://doi.org/10.1177/0162243903259190 Ouyang, F., & Zhang, L. (2024). AI-driven learning analytics applications and tools in computer-supported...

  9. [9]

    Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

    Walters, W. H., & Wilder, E. I. (2023). Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports, 13(1), 14045. https://doi.org/10.1038/s41598-023- 41032-5 Wang, L., Kameswaran, V ., & Kacorri, H. (2025). Toward a taxonomy of algorithmic harms for disability: A systematic review. Proceedings of the AAAI/ACM Conference...

  10. [10]

    https://doi.org/10.1186/s41239-019-0171-0 Zheng, H., & Zhan, H. (2023). ChatGPT in scientific writing: A cautionary tale. The American Journal of Medicine, 136(8), 725-726. https://doi.org/10.1016/j.amjmed.2023.02.011 Zhou, R., Chen, L., & Yu, K. (2024). Is LLM a reliable reviewer? A comprehensive evaluation of LLM on automatic paper reviewing tasks. In N...