pith. machine review for the scientific record. sign in

arxiv: 2605.11365 · v1 · submitted 2026-05-12 · 💻 cs.AI · cs.LG· stat.ML

Recognition: no theorem link

Causal Bias Detection in Generative Artifical Intelligence

Drago Plecko

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:33 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.ML
keywords causal fairnessgenerative AIbias detectioncausal decompositionlarge language modelsfairness pathwaysmechanism replacement
0
0 comments X

The pith

Generative AI fairness can be quantified by decomposing bias along causal pathways and by measuring how the model's mechanisms replace real-world ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes causal fairness for generative models that build their own beliefs about all causal mechanisms, in contrast to standard machine learning where only a single predictor is learned while inheriting the rest of the world's mechanisms. It unifies both settings under one framework and derives new decomposition results that separate fairness effects into distinct causal paths and into the substitution of real mechanisms by the generative model's own versions. Identification conditions and practical estimators are established, with the approach demonstrated on race and gender bias in large language models. A reader would care because generative systems now produce content rather than merely predict, so bias can propagate through invented causal structures that standard fairness tools miss.

Core claim

The central claim is that causal fairness in generative AI requires new decomposition results that quantify impacts both along separate causal pathways and through the replacement of real-world causal mechanisms by those implicitly constructed inside the generative model, all under a framework that also covers the standard predictive setting; these quantities are identified from data under stated conditions and can be estimated efficiently, as shown by applying the method to detect bias patterns in large language models.

What carries the argument

The new causal decomposition results that separate fairness contributions into pathway-specific effects and into the difference between real-world and model-constructed mechanisms.

If this is right

  • Bias in generative outputs can be attributed to specific causal routes rather than treated as an aggregate disparity.
  • The fairness cost of letting the model invent its own mechanisms instead of using real-world ones becomes measurable.
  • The same framework permits direct comparison of causal fairness between predictive and generative systems.
  • Efficient estimators make it feasible to audit large generative models such as LLMs for these decomposed effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Auditing tools could intervene on individual causal pathways inside a generative model to reduce bias without retraining the entire system.
  • The framework might extend naturally to image or video generators by treating pixel-level or scene-level mechanisms as the objects being replaced.
  • If certain pathways dominate the decompositions in practice, targeted data interventions on those paths could serve as a practical mitigation strategy.

Load-bearing premise

That the generative model constructs identifiable beliefs over causal mechanisms and that data suffice to estimate the new decompositions without further strong assumptions on the generative process.

What would settle it

In a synthetic data experiment with fully known causal graph and mechanisms, applying the proposed decompositions and estimators fails to recover the ground-truth contributions of each pathway and mechanism replacement.

Figures

Figures reproduced from arXiv: 2605.11365 by Drago Plecko.

Figure 1
Figure 1. Figure 1: Standard Fairness Model (SFM) for machine learning and generative AI settings. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical models for (a) x-specific direct effect; (b) generic potential outcome in S-SFM. nism f rw Y , or according to the ML model f m Y . In this way, the S-node can be used to denote from which generative environment (real world or model) the data is sampled. In context of generative AI, as mentioned earlier, the S-node must point to all covariates X, Z, W, and Y , since generative models are able to … view at source ↗
Figure 3
Figure 3. Figure 3: For both potential outcomes, S = s0 for each mechanism, meaning that all the mechanisms are from the real world. The difference between the two potential outcomes lies in the value of X 4 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quantifying differences using S-SFM potential outcomes. along the direct path X → Y , and thus captures the direct effect of a x0 → x1 transition in the real world. We contrast this with the difference between (c) and (d) of [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Standard Fairness Models for the three datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hierarchical clustering of model bias signatures (L1 distance, Ward linkage). developer families and parameter scales. Further, while the Llama 3 siblings sit at L1 = 0.39, the Qwen 3.5 pair is at 0.62 and the Gemma 3 pair at 1.22 – both farther apart than many cross-family pairs. These mixed groupings motivate a formal test of whether family membership reliably predicts bias similarity. Using a permutatio… view at source ↗
Figure 6
Figure 6. Figure 6: Counterfactual graph for proof of Prop. 3. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Similarity of model bias signatures: (a) full pairwise [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: TV decomposition into ∆x-DE, ∆x-IE, and ∆x-SE for Gemma 3 27B on NSDUH. be low-earners. Indirect (0.3% ± 1.3%) and spurious (0.5% ± 2.7%) effects are both small and not significant, indicating that the disparity does not flow through observed mediators or confounders. Similarity of Bias Signatures [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: TV decomposition into ∆x-DE, ∆x-IE, and ∆x-SE for Qwen 3.5 27B on BRFSS. under Gemma’s fY , the direct effect is sensitive to the distribution of W, and Gemma’s fW produces a W | X distribution that pushes the direct effect toward stereotyping minorities as more likely to use marijuana. Finally, the fX,Z replacement shifts the direct effect by −5.3% ± 8.8%, and the fully-replaced DEs1 = −1.0% ± 8.3% no lon… view at source ↗
read the original abstract

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper formalizes causal fairness in generative AI and unifies it with the standard ML setting under a common framework. It derives new causal decomposition results to quantify fairness impacts along different causal pathways and via replacement of real-world mechanisms by the generative model's mechanisms, establishes identification conditions and efficient estimators, and demonstrates the approach by analyzing race and gender bias in large language models across datasets.

Significance. If the decompositions and identification results are sound, the work would meaningfully extend causal fairness methodology to generative models, which implicitly define mechanisms across variables rather than learning a single predictor. The granular pathway and mechanism-replacement decompositions address a clear gap, and the LLM application provides a concrete test case for high-stakes bias detection.

major comments (2)
  1. [Abstract] Abstract: the central claim of 'new causal decomposition results that enable granular quantification' and 'identification conditions' for generative models is load-bearing, yet the provided text supplies no equations, SCM assumptions, or identification proofs. Without these, it is impossible to assess whether the decompositions reduce to post-hoc fitted quantities or require unverifiable knowledge of the black-box model's implicit causal mechanisms (as flagged in the stress-test).
  2. [Abstract] Abstract: the unification with standard ML and the claim that generative models 'implicitly construct their own beliefs about all causal mechanisms' rest on the weakest assumption that identification conditions exist allowing estimation from data without strong additional assumptions on the generative process. For LLMs this is particularly tenuous, as black-box models need not respect a consistent causal structure.
minor comments (1)
  1. Title: 'Artifical' is misspelled and should read 'Artificial'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for identifying areas where the abstract could better convey the paper's technical contributions. We respond to each major comment below and have made targeted revisions to improve clarity without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'new causal decomposition results that enable granular quantification' and 'identification conditions' for generative models is load-bearing, yet the provided text supplies no equations, SCM assumptions, or identification proofs. Without these, it is impossible to assess whether the decompositions reduce to post-hoc fitted quantities or require unverifiable knowledge of the black-box model's implicit causal mechanisms (as flagged in the stress-test).

    Authors: The abstract is a high-level summary constrained by length; the full manuscript details the SCM (with observed variables and mechanism replacement), the decomposition theorems (separating pathway-specific effects and real-world vs. model mechanism effects), and identification proofs under standard assumptions (e.g., consistency, positivity, and access to model conditionals). The quantities are identified from observable data and model queries rather than requiring full internal mechanism knowledge or reducing to purely post-hoc fits. We have revised the abstract to include a brief reference to the identification strategy and key assumptions. revision: partial

  2. Referee: [Abstract] Abstract: the unification with standard ML and the claim that generative models 'implicitly construct their own beliefs about all causal mechanisms' rest on the weakest assumption that identification conditions exist allowing estimation from data without strong additional assumptions on the generative process. For LLMs this is particularly tenuous, as black-box models need not respect a consistent causal structure.

    Authors: The unification treats standard ML as the special case of replacing only the outcome mechanism, while generative models replace multiple mechanisms via their joint distribution. The framework does not require the generative model to obey a fixed causal DAG or 'consistent structure'; it operates on the model's induced conditionals for counterfactual estimation. For LLMs we explicitly discuss black-box limitations and use prompting-based estimation with sensitivity checks in the experiments. We have revised the abstract and introduction to state the identification conditions more precisely and note the assumptions required for LLMs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces new formalization and decompositions without reducing to fitted inputs or self-citations by construction

full rationale

The paper's abstract and claimed contributions describe formalizing causal fairness for generative AI, unifying it with standard ML, deriving new decomposition results for pathways and mechanism replacement, and establishing identification conditions plus estimators. No load-bearing steps are shown to reduce by definition or construction to prior fitted parameters, self-citations, or ansatzes from the same authors. The central claims rest on new theoretical developments and demonstrations on LLMs rather than renaming known results or smuggling assumptions via self-reference. This qualifies as a self-contained derivation against external benchmarks, with the reader's uncertainty score reflecting absence of equations rather than any exhibited circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents exhaustive listing; the work relies on unspecified identification conditions for causal quantities and standard causal inference background.

axioms (1)
  • domain assumption Identification conditions exist that allow causal quantities of interest to be recovered from observed data in the generative setting.
    Abstract states that identification conditions are established.

pith-pipeline@v0.9.0 · 5541 in / 1239 out tokens · 64841 ms · 2026-05-13T02:33:19.469143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 12 internal anchors

  1. [1]

    Phi-4 Technical Report

    M. Abdin, J. Aneja, H. Behl, S. Bubeck, R. Eldan, S. Gunasekar, M. Harrison, R. J. Hewett, M. Javaheripi, P. Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

  2. [2]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Angwin, J

    J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias: There’s soft- ware used across the country to predict future criminals. and it’s biased against blacks.ProPublica, 5 2016. URLhttps://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencing

  4. [4]

    Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

  5. [5]

    Bareinboim.Causal Artificial Intelligence: A Roadmap for Building Causally Intelligent Systems

    E. Bareinboim.Causal Artificial Intelligence: A Roadmap for Building Causally Intelligent Systems. Online, 2025. URLhttps://causalai-book.net/. Draft version

  6. [6]

    Barocas and A

    S. Barocas and A. D. Selbst. Big data’s disparate impact.Calif. L. Rev., 104:671, 2016

  7. [7]

    X. Bi, D. Chen, G. Chen, S. Chen, D. Dai, C. Deng, H. Ding, K. Dong, Q. Du, Z. Fu, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv preprint arXiv:2401.02954, 2024

  8. [8]

    F. D. Blau and L. M. Kahn. The gender earnings gap: learning from international comparisons. The American Economic Review, 82(2):533–538, 1992

  9. [9]

    F. D. Blau and L. M. Kahn. The gender wage gap: Extent, trends, and explanations.Journal of economic literature, 55(3):789–865, 2017

  10. [10]

    Brennan, W

    T. Brennan, W. Dieterich, and B. Ehret. Evaluating the predictive validity of the compas risk and needs assessment system.Criminal Justice and Behavior, 36(1):21–40, 2009

  11. [11]

    Buolamwini and T

    J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In S. A. Friedler and C. Wilson, editors,Proceedings of the 1st Confer- ence on Fairness, Accountability and Transparency, volume 81 ofProceedings of Machine Learning Research, pages 77–91, NY , USA, 2018

  12. [12]

    Behavioral Risk Factor Surveillance System Sur- vey Data.https://www.cdc.gov/brfss/, 2023

    Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System Sur- vey Data.https://www.cdc.gov/brfss/, 2023. U.S. Department of Health and Human Services

  13. [13]

    Cheong, S

    J. Cheong, S. Kalkan, and H. Gunes. Counterfactual fairness for facial expression recognition. InEuropean Conference on Computer Vision, pages 245–261. Springer, 2022

  14. [14]

    Chernozhukov, D

    V . Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters, 2018

  15. [15]

    S. Chiappa. Path-specific counterfactual fairness. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7801–7808, 2019

  16. [16]

    J. D. Correa and E. Bareinboim. Counterfactual graphical models: Constraints and inference. InForty-second International Conference on Machine Learning, 2025

  17. [17]

    Datta, M

    A. Datta, M. C. Tschantz, and A. Datta. Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination.Proceedings on Privacy Enhancing Technologies, 2015 (1):92–112, Apr. 2015. doi: 10.1515/popets-2015-0007

  18. [18]

    De-Arteaga, A

    M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, and A. T. Kalai. Bias in bios: A case study of semantic representation bias in a high-stakes setting. Inproceedings of the Conference on Fairness, Accountability, and Transparency, pages 120–128, 2019. 10

  19. [19]

    S. Garg, V . Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel. Counterfactual fairness in text classification through robustness. InProceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 219–226, 2019

  20. [20]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  21. [21]

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  22. [22]

    L. A. Hendricks, K. Burns, K. Saenko, T. Darrell, and A. Rohrbach. Women also snowboard: Overcoming bias in captioning models. InProceedings of the European conference on com- puter vision (ECCV), pages 771–787, 2018

  23. [23]

    Joo and K

    J. Joo and K. Kärkkäinen. Gender slopes: Counterfactual fairness for computer vision mod- els by attribute manipulation. InProceedings of the 2nd international workshop on fairness, accountability, transparency and ethics in multimedia, pages 1–5, 2020

  24. [24]

    S. Jung, S. Yu, S. Chun, and T. Moon. Do counterfactually fair image classifiers satisfy group fairness?–a theoretical and empirical study.Advances in Neural Information Processing Sys- tems, 37:56041–56053, 2024

  25. [25]

    A. E. Khandani, A. J. Kim, and A. W. Lo. Consumer credit-risk models via machine-learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

  26. [26]

    Kilbertus, M

    N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning.arXiv preprint arXiv:1706.02744, 2017

  27. [27]

    H. Kim, S. Shin, J. Jang, K. Song, W. Joo, W. Kang, and I.-C. Moon. Counterfactual fairness with disentangled causal effect variational autoencoder. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35 (9), pages 8128–8136, 2021

  28. [28]

    M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness.Advances in neural information processing systems, 30, 2017

  29. [29]

    W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Sto- ica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

  30. [30]

    Holistic Evaluation of Language Models

    P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y . Zhang, D. Narayanan, Y . Wu, A. Kumar, et al. Holistic evaluation of language models.arXiv preprint arXiv:2211.09110, 2022

  31. [31]

    A. H. Liu, K. Khandelwal, S. Subramanian, V . Jouault, A. Rastogi, A. Sadé, A. Jeffares, A. Jiang, A. Cahill, A. Gavaudan, et al. Ministral 3.arXiv preprint arXiv:2601.08584, 2026

  32. [32]

    Luccioni, C

    S. Luccioni, C. Akiki, M. Mitchell, and Y . Jernite. Stable bias: Evaluating societal represen- tations in diffusion models.Advances in Neural Information Processing Systems, 36:56338– 56351, 2023

  33. [33]

    J. F. Mahoney and J. M. Mohen. Method and system for loan origination and underwriting, Oct. 23 2007. US Patent 7,287,008

  34. [34]

    Nabi and I

    R. Nabi and I. Shpitser. Fair inference on outcomes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  35. [35]

    Nadeem, A

    M. Nadeem, A. Bethke, and S. Reddy. Stereoset: Measuring stereotypical bias in pretrained language models. InProceedings of the 59th annual meeting of the association for computa- tional linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5356–5371, 2021

  36. [36]

    Naik and B

    R. Naik and B. Nushi. Social biases through the text-to-image generation lens. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 786–808, 2023. 11

  37. [37]

    Nangia, C

    N. Nangia, C. Vania, R. Bhalerao, and S. Bowman. Crows-pairs: A challenge dataset for measuring social biases in masked language models. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1953–1967, 2020

  38. [38]

    D. Pager. The mark of a criminal record.American journal of sociology, 108(5):937–975, 2003

  39. [39]

    Pearl.Causality: Models, Reasoning, and Inference

    J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009

  40. [40]

    Pearl and E

    J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal ap- proach. InProceedings of the AAAI Conference on Artificial Intelligence, volume 25 (1), pages 247–254, 2011

  41. [41]

    Ple ˇcko and E

    D. Ple ˇcko and E. Bareinboim. Reconciling predictive and statistical parity: A causal approach. Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024

  42. [42]

    Ple ˇcko and E

    D. Ple ˇcko and E. Bareinboim. Causal fairness analysis.Foundations and Trends in Machine Learning, 17 (3):304–589, 2024

  43. [43]

    Ple ˇcko and N

    D. Ple ˇcko and N. Meinshausen. Fair data adaptation with quantile preservation.Journal of Machine Learning Research, 21:242, 2020

  44. [44]

    Ple ˇcko, P

    D. Ple ˇcko, P. Okanovi´c, S. Havaldar, T. Hoefler, and E. Bareinboim. Epidemiology of large language models: A benchmark for observational distribution knowledge.arXiv preprint arXiv:2511.03070, 2025. URLhttps://arxiv.org/pdf/2511.03070

  45. [45]

    S. SAMHSA. National Survey on Drug Use and Health (NSDUH).https://www.samhsa. gov/data/data-we-collect/nsduh-national-survey-drug-use-and-health,

  46. [46]

    Department of Health and Human Services

    U.S. Department of Health and Human Services

  47. [47]

    J. Sanburn. Facebook thinks some native american names are inauthentic.Time, Feb. 14 2015. URLhttp://time.com/3710203/facebook-native-american-names/

  48. [48]

    i’m sorry to hear that

    E. M. Smith, M. Hall, M. Kambadur, E. Presani, and A. Williams. “i’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. InProceedings of the 2022 conference on empirical methods in natural language processing, pages 9180–9211, 2022

  49. [49]

    L. Sweeney. Discrimination in online ad delivery. Technical Report 2208240, SSRN, Jan. 28

  50. [50]

    URLhttp://dx.doi.org/10.2139/ssrn.2208240

  51. [51]

    L. T. Sweeney and C. Haney. The influence of race on sentencing: A meta-analytic review of experimental studies.Behavioral Sciences & the Law, 10(2):179–195, 1992

  52. [52]

    G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

  53. [53]

    G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mes- nard, B. Shahriari, A. Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

  54. [54]

    Census Bureau

    U.S. Census Bureau. American Community Survey 1-Year Estimates, 2023. Retrieved from https://data.census.gov/

  55. [55]

    S. Wang, X. Cao, J. Zhang, Z. Yuan, S. Shan, X. Chen, and W. Gao. Vlbiasbench: A com- prehensive benchmark for evaluating bias in large vision-language model.arXiv preprint arXiv:2406.14194, 2024

  56. [56]

    Y . Wu, L. Zhang, X. Wu, and H. Tong. Pc-fairness: A unified framework for measuring causality-based fairness.Advances in neural information processing systems, 32, 2019

  57. [57]

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12

  58. [58]

    Zhang and E

    J. Zhang and E. Bareinboim. Equality of opportunity in classification: A causal approach. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 3671–3681, Montreal, Canada,

  59. [59]

    Curran Associates, Inc

  60. [60]

    Zhang and E

    J. Zhang and E. Bareinboim. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. 13 Supplementary Material forCausal Bias Detection in Generative Artifical Intelligence The source code for reproducing the results can be found in the anonymized code repository https:/...

  61. [61]

    Write a single narrative enclosed in <story>...</story>

  62. [62]

    Do NOT include headings, lists, analysis, or any text outside the tags

  63. [63]

    Mention ALL facts given below exactly once (age, sex, race, education, income, marijuana use last month)

  64. [64]

    Keep it under 200 words. known facts to be mentioned: - age = 30-34 years - sex = female - race = White unknown facts to be mentioned: - edu (possible values: <= 8th grade, Some high school, High school graduate, Some college no degree, Associate degree, Bachelor’s or higher) - income (possible values: < $10,000, $10,000 - $19,999, $20,000 - $29,999, $30,...