Structural Under-Representation of Women in News: Nonparametric Bayesian Mixtures Capture Time-Dependent Dynamics

Isabella Habereder; Isao Echizen; Thomas Kneib; Timo Spinde

arxiv: 2606.10772 · v1 · pith:G62JLXGQnew · submitted 2026-06-09 · 📊 stat.AP

Structural Under-Representation of Women in News: Nonparametric Bayesian Mixtures Capture Time-Dependent Dynamics

Isabella Habereder , Thomas Kneib , Isao Echizen , Timo Spinde This is my paper

Pith reviewed 2026-06-27 10:57 UTC · model grok-4.3

classification 📊 stat.AP

keywords gender bias in mediafemale representation in newsBayesian nonparametric mixturestime series clusteringquote share analysisCanadian mediastructural biasdynamic density estimation

0 comments

The pith

A time-dependent Bayesian mixture model on Canadian news data shows persistent structural under-representation of women as sources across all identified clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies a nonparametric Bayesian mixture model with Beta kernels to female quote shares from Canadian news articles published between 2019 and 2024. It seeks to uncover hidden cluster structures and time trends in gender representation that depend on both topic and reported region. A sympathetic reader would care because the results indicate that under-representation is widespread across clusters, shaped more by topic than by geography, and shows no improvement in the great majority of cases. The model also reports that the overall distribution of female quote shares stayed unchanged over the five years. If accurate, this points to stable patterns in media sourcing that simpler methods may not fully detect.

Core claim

Fitted on Canadian news articles from 2019 to 2024, the model reveals structural under-representation of women across all clusters, with news topic driving differences in female quote shares more strongly than the reported-on region. More than 85% of topic-region time series show no improvement toward gender parity over the observation period. Dynamic density estimation confirms that the aggregate distribution of female quote shares remains stable between 2019 and 2024.

What carries the argument

Time-dependent Bayesian mixture model with Beta mixture kernel for bounded proportions, used to recover latent clusters and track their evolution.

Load-bearing premise

The time-dependent Bayesian mixture model with Beta kernel accurately recovers true latent cluster structures and temporal dynamics without substantial distortion from model assumptions, sampling, or unmeasured factors.

What would settle it

Re-fitting an alternative clustering method to the same data or extending the series past 2024 and finding different cluster assignments or a clear rise in female quote shares would contradict the reported stability and structure.

Figures

Figures reproduced from arXiv: 2606.10772 by Isabella Habereder, Isao Echizen, Thomas Kneib, Timo Spinde.

**Figure 1.** Figure 1: Illustration of the cluster mean over time. The bubble size is proportional to the number of observations. pattern, where two topics (topic 2 - public health/healthcare; topic 5 - culture) alone account for over half of cluster 0. The region × topic heatmap (see fig. 6 in section 8.1 in the appendix) makes the dominance of topic effects visually explicit. Columns for topic 2 (public health/healthcare) and … view at source ↗

**Figure 2.** Figure 2: Illustration of the posterior co-clustering matrices for 2019–2024. America consistently occupies the upper end of this range (peaking in 2024), while the regions occupying the lower end are not consistent over time. Reports about the Russian and the Oriental region are placed at the lower end in 2019-2021. In 2023- 2024, the lower end is composed of reports on South Asia and East Asia. Despite this orderi… view at source ↗

**Figure 3.** Figure 3: Plot of the density estimation aggregated to the single years. care, and medical research as female-dominated topics in terms of quote share (based on the same data; see section 3). However, our results reveal a significant difference: despite a higher share of female quotes than in other topics, these topics still remain male-dominated at our level of aggregation. However, it can be concluded that the clu… view at source ↗

**Figure 4.** Figure 4: Topic composition per cluster 0.0 0.1 Share of observations AngloAmerican AustralianOceanian EastAsian EasternEurope LatinAmerican Oriental Russian SouthAsian SoutheastAsian SubSaharanAfrican WesternEurope Cluster 0 (mean=0.313) 0.00 0.05 0.10 Share of observations AngloAmerican AustralianOceanian EastAsian EasternEurope LatinAmerican Oriental Russian SouthAsian SoutheastAsian SubSaharanAfrican WesternEuro… view at source ↗

**Figure 5.** Figure 5: Region composition per cluster 15 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Region × topic heatmap of posterior mean across all time points. T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 0 1 2 3 4 5 6 Variance (x10^-4) Within-region variance Between-region variance [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of the between- vs. within-region variance by topic. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Region-level traces over time. We averaged the posterior mean across topics. 8.2. Data Pre-Pocessing The dataset consists of English- and French-language text. First, we determined each article’s language by using a pretrained language classifier fastText [19, 20]. We only included English-language news articles in our analysis. We extended the narrative location for each article. We define narrative locat… view at source ↗

**Figure 9.** Figure 9: Illustration of the posterior co-clustering matrices for the country × topic aggregation for 2019- 2024 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

read the original abstract

The under-representation of women as sources cited in news media is one prominent representation of gender bias. Understanding where gender bias concentrates and how it evolves is essential for targeted mitigation. Because gender representation varies across topics, time, and reported-on regions, creating complex dependencies that are difficult to capture parametrically, we employ a nonparametric model to uncover latent cluster structures and temporal dynamics. We combine time-dependent Bayesian mixture modeling techniques with a Beta mixture kernel tailored to female quote shares, bounded between 0 and 1. Fitted on Canadian news articles from 2019 to 2024, the model reveals structural under-representation of women across all clusters, with news topic driving differences in female quote shares more strongly than the reported-on region. More than 85% of topic-region time series show no improvement toward gender parity over the observation period. Dynamic density estimation confirms that the aggregate distribution of female quote shares remains stable between 2019 and 2024. Our application demonstrates that advanced probabilistic models not only reproduce findings in gender bias research but also reveal latent dependencies and structural patterns that simpler approaches miss, encouraging future adoption of model-based frameworks for studying media bias.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fits a time-dependent Bayesian mixture to Canadian news quote shares and reports stable under-representation of women driven more by topic than region, with over 85% of series showing no improvement, but supplies no recovery checks or data-validation steps.

read the letter

The core contribution is an application of nonparametric Bayesian mixtures with Beta kernels to female quote proportions in Canadian news articles from 2019-2024. It finds structural under-representation across clusters, stronger effects from news topic than from reported-on region, more than 85% of topic-region series with no movement toward parity, and a stable aggregate density over the period.

The work does a clean job of moving beyond simple averages to latent cluster structures and time dynamics on real proportion data. That distinction between topic and region effects is useful for anyone thinking about targeted interventions, and the claim that simpler approaches miss these patterns is at least plausible given the model choice.

The main limitation is the lack of any reported simulation recovery experiments or sensitivity checks on the time-dependent mixture. Without those, the stability result and the topic-dominance ordering could be influenced by the nonparametric prior, the Beta kernel, or unexamined sampling choices in article and quote extraction. The abstract gives percentages but no error bars, cross-validation, or alternative specifications, which leaves the quantitative claims harder to assess.

This paper is for computational social scientists or media-bias researchers who already work with mixture models on bounded data and want a concrete case study. A reader focused on gender sourcing in one national media system will find the numbers and the topic-versus-region split worth seeing.

It is worth sending to peer review. The empirical scope is limited but honest, the modeling framework is appropriate, and the gaps are fixable with added validation rather than fatal to the design.

Referee Report

2 major / 2 minor

Summary. The paper fits a nonparametric time-dependent Bayesian mixture model with Beta kernel to female quote shares extracted from Canadian news articles (2019–2024). It reports structural under-representation of women in all recovered clusters, stronger influence of news topic than reported-on region, that >85% of topic-region time series exhibit no improvement toward parity, and that the aggregate density of female quote shares remains stable over the period. The work positions the model as revealing latent dependencies missed by simpler approaches.

Significance. If the recovery properties of the time-dependent Beta mixture hold, the results would supply quantitative evidence of persistent, topic-driven gender bias in Canadian media and illustrate the added value of nonparametric dynamic mixtures for media-bias studies. The stability finding and topic-versus-region comparison would be directly usable for targeted interventions.

major comments (2)

[§3] §3 (Model specification and fitting): the central claims (structural under-representation across clusters, topic dominance, >85% non-improving series, stable aggregate density) rest on the time-dependent Beta mixture correctly recovering latent cluster assignments and temporal trends. No simulation recovery experiment is described that injects known cluster labels, known improving vs. stable trajectories, and known topic/region effects and then verifies that the fitted model returns the reported proportions and ordering. Without this check, misspecification in the time-dependence mechanism or nonparametric prior could produce the observed stability and topic dominance as artifacts.
[§4] §4 (Results): data collection and quote-extraction details (article sampling frame, quote attribution rules, handling of multiple quotes per article) are not accompanied by sensitivity checks or bias diagnostics under the model. These steps are load-bearing for the claim that topic drives differences more strongly than region.

minor comments (2)

[§2] Notation for the time-dependent mixing weights and the Beta kernel parameters should be introduced with explicit equations rather than prose descriptions only.
[Figure 3] Figure captions for the dynamic density plots should state the exact time windows compared and the bandwidth or smoothing parameter used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive report and for highlighting areas where additional validation would strengthen the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns about model recovery and data sensitivity.

read point-by-point responses

Referee: [§3] §3 (Model specification and fitting): the central claims (structural under-representation across clusters, topic dominance, >85% non-improving series, stable aggregate density) rest on the time-dependent Beta mixture correctly recovering latent cluster assignments and temporal trends. No simulation recovery experiment is described that injects known cluster labels, known improving vs. stable trajectories, and known topic/region effects and then verifies that the fitted model returns the reported proportions and ordering. Without this check, misspecification in the time-dependence mechanism or nonparametric prior could produce the observed stability and topic dominance as artifacts.

Authors: We agree that a simulation-based recovery study is a valuable addition to substantiate the model's ability to recover the reported structures. In the revised manuscript we will insert a new subsection (likely in §3) that generates synthetic datasets with known cluster labels, known improving versus stable trajectories, and known topic/region effects. We will then fit the time-dependent Beta mixture and report quantitative recovery metrics, including adjusted Rand index for cluster assignments, mean absolute error on trend slopes, and whether the model recovers the >85% non-improving proportion and the topic-over-region dominance ordering. This will directly test whether the observed stability and topic dominance can arise as artifacts. revision: yes
Referee: [§4] §4 (Results): data collection and quote-extraction details (article sampling frame, quote attribution rules, handling of multiple quotes per article) are not accompanied by sensitivity checks or bias diagnostics under the model. These steps are load-bearing for the claim that topic drives differences more strongly than region.

Authors: We acknowledge that the current manuscript provides limited sensitivity diagnostics for the quote-extraction pipeline. In the revision we will expand the data section with explicit descriptions of the sampling frame, attribution rules, and multiple-quote handling. We will also add a sensitivity subsection that re-runs the full pipeline under alternative attribution thresholds, article subsampling schemes, and quote-count weightings, then quantifies the stability of the topic-versus-region dominance result (e.g., via changes in posterior topic coefficients and the proportion of non-improving series). Any material shifts will be reported transparently. revision: yes

Circularity Check

0 steps flagged

Empirical model fitting on observed quote shares; no reduction of claims to fitted inputs by construction

full rationale

The paper applies a nonparametric time-dependent Bayesian mixture model with Beta kernel to Canadian news article data (2019-2024) on female quote shares. Reported results (under-representation across clusters, topic > region effects, >85% of topic-region series showing no improvement, stable aggregate density) are direct empirical summaries of the posterior from fitting the model to the data. No equations or claims in the provided text reduce these quantities to quantities defined solely by the model parameters themselves, nor do any self-citations serve as load-bearing justifications for uniqueness or ansatz choices. The derivation is a standard application of existing mixture modeling techniques to new data and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Model rests on standard Bayesian nonparametric assumptions plus data-specific choices for the Beta kernel and time dynamics; no invented entities. Free parameters are the mixture hyperparameters and cluster count (implicit in nonparametric setup).

free parameters (1)

mixture hyperparameters and effective number of clusters
Dirichlet process or equivalent concentration parameters and Beta kernel shape parameters are fitted or chosen to match the quote share data.

axioms (1)

domain assumption Female quote shares arise from a mixture of Beta distributions whose parameters evolve over time according to the nonparametric model.
Central modeling premise stated in the abstract for capturing latent structures.

pith-pipeline@v0.9.1-grok · 5745 in / 1264 out tokens · 28194 ms · 2026-06-27T10:57:52.099343+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 1 linked inside Pith

[1]

Particle markov chain monte carlo methods.Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(3):269–342, 2010

Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain monte carlo methods.Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(3):269–342, 2010

2010
[2]

Bayesian cluster analysis.Biometrika, 65(1):31–38, 1978

David A Binder. Bayesian cluster analysis.Biometrika, 65(1):31–38, 1978

1978
[3]

Latent dirichlet allocation.Journal of machine Learning research, 3(Jan):993–1022, 2003

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation.Journal of machine Learning research, 3(Jan):993–1022, 2003

2003
[4]

Measuring partisan media bias cross-nationally.Swiss Political Science Review, 27 (2):412–433, 2021

Laia Castro. Measuring partisan media bias cross-nationally.Swiss Political Science Review, 27 (2):412–433, 2021

2021
[5]

Costa-juss` a

Marta R. Costa-juss` a. An analysis of gender bias studies in natural language processing.Nature Machine Intelligence, 1:495–496, 2019

2019
[6]

Does gender matter in the news? detecting and examining gender bias in news articles

Jamell Dacon and Haochen Liu. Does gender matter in the news? detecting and examining gender bias in news articles. InCompanion Proceedings of the Web Conference 2021, pages 385–392, 2021

2021
[7]

Bayesian nonparametric mixture modeling for temporal dynamics of gender stereotypes.The Annals of Applied Statistics, 17(3):2256–2278, 2023

Maria De Iorio, Stefano Favaro, Alessandra Guglielmi, and Lifeng Ye. Bayesian nonparametric mixture modeling for temporal dynamics of gender stereotypes.The Annals of Applied Statistics, 17(3):2256–2278, 2023

2023
[8]

Gender hierarchies in reporting genocide: an analysis of the dehumanization of palestinian men in western media.Communication, Culture & Critique, 18(4):310–321, 2025

Noura El Masry, Zina Sawaf, Gretchen King, and Sami Baroudi. Gender hierarchies in reporting genocide: an analysis of the dehumanization of palestinian men in western media.Communication, Culture & Critique, 18(4):310–321, 2025

2025
[9]

A bayesian analysis of some non-parametric problems.The Annals of Statistics, 1(2):353–355, 1873

Thomas S Ferguson. A bayesian analysis of some non-parametric problems.The Annals of Statistics, 1(2):353–355, 1873
[10]

Israeli media coverage of international male and female politicians: Gender and ethnopolitical aspects.Communications, 48(2):226–248, 2023

Gilad Greenwald. Israeli media coverage of international male and female politicians: Gender and ethnopolitical aspects.Communications, 48(2):226–248, 2023

2023
[11]

Stick-breaking autoregressive processes.Journal of economet- rics, 162(2):383–396, 2011

Jim E Griffin and Mark FJ Steel. Stick-breaking autoregressive processes.Journal of economet- rics, 162(2):383–396, 2011

2011
[12]

Order-based dependent dirichlet processes.Journal of the American statistical Association, 101(473):179–194, 2006

Jim E Griffin and MF J Steel. Order-based dependent dirichlet processes.Journal of the American statistical Association, 101(473):179–194, 2006

2006
[13]

A time dependent bayesian nonparametric model for air quality analysis.Computational Statistics & Data Analysis, 95:161–175, 2016

Luis Guti´ errez, Rams´ es H Mena, and Matteo Ruggiero. A time dependent bayesian nonparametric model for air quality analysis.Computational Statistics & Data Analysis, 95:161–175, 2016

2016
[14]

A systematic review of spatio-temporal statistical models: Theory, structure, and applications.arXiv preprint arXiv:2511.00422, 2025

Isabella Habereder, Thomas Kneib, Isao Echizen, and Timo Spinde. A systematic review of spatio-temporal statistical models: Theory, structure, and applications.arXiv preprint arXiv:2511.00422, 2025

arXiv 2025
[15]

Marc Hooghe, Laura Jacobs, and Ellen Claes. Enduring gender bias in reporting on political elite positions: Media coverage of female mps in belgian news broadcasts (2003–2011).The International Journal of Press/Politics, 20(4):395–414, 2015

2003
[16]

The promises and pitfalls of llm annotations in dataset labeling: A case study on media bias detection

Tom´ aˇ s Horych, Christoph Mandl, Terry Ruas, Andr´ e Greiner-Petter, Bela Gipp, Akiko Aizawa, and Timo Spinde. The promises and pitfalls of llm annotations in dataset labeling: A case study on media bias detection. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 1370–1386, 2025

2025
[17]

Women are seen more than heard in online newspapers.PloS one, 11(2):e0148434, 2016

Sen Jia, Thomas Lansdall-Welfare, Saatviga Sudhahar, Cynthia Carter, and Nello Cristianini. Women are seen more than heard in online newspapers.PloS one, 11(2):e0148434, 2016

2016
[18]

Violators, virtuous, or victims? how global newspapers represent the female member of parliament.Feminist Media Studies, 20(5): 692–712, 2020

Devin K Joshi, Meseret F Hailu, and Lauren J Reising. Violators, virtuous, or victims? how global newspapers represent the female member of parliament.Feminist Media Studies, 20(5): 692–712, 2020

2020
[19]

Fasttext

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, H´ erve J´ egou, and Tomas Mikolov. Fasttext. zip: Compressing text classification models.arXiv preprint arXiv:1612.03651, 2016

Pith/arXiv arXiv 2016
[20]

Bag of tricks for efficient text classification

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tom´ aˇ s Mikolov. Bag of tricks for efficient text classification. InProceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers, pages 427–431, 2017

2017
[21]

Die geographie und die kulturerdteile

Albert Kolb. Die geographie und die kulturerdteile. In A. Leidlmair, editor,Hermann von Wissmann-Festschrift, page 46. Geographisches Institut der Universit¨ at T¨ ubingen, 1962

1962
[22]

Dirichlet process mixtures of beta distributions, with applications to density and intensity estimation

Athanasios Kottas. Dirichlet process mixtures of beta distributions, with applications to density and intensity estimation. InWorkshop on Learning with Nonparametric Bayesian Methods, 23rd International Conference on Machine Learning (ICML), volume 47, 2006

2006
[23]

Athanasios Kottas and Bruno Sans´ o. Bayesian mixture modeling for spatial poisson process intensities, with applications to extreme value analysis.Journal of Statistical Planning and Inference, 137(10):3151–3163, 2007

2007
[24]

Bayesian model-based clustering procedures.Journal of Com- putational and Graphical Statistics, 16(3):526–558, 2007

John W Lau and Peter J Green. Bayesian model-based clustering procedures.Journal of Com- putational and Graphical Statistics, 16(3):526–558, 2007

2007
[25]

Logistic-beta processes for dependent random probabilities with beta marginals.Bayesian Analysis, 20(4):1345–1369, 13 2025

Changwoo J Lee, Alessandro Zito, Huiyan Sang, and David B Dunson. Logistic-beta processes for dependent random probabilities with beta marginals.Bayesian Analysis, 20(4):1345–1369, 13 2025

2025
[26]

The specific visuality of women of the global south in the media of the global north

Sohyun Lee. The specific visuality of women of the global south in the media of the global north. Humanities and Social Sciences Communications, 11(1):1–10, 2024

2024
[27]

On a class of bayesian nonparametric estimates: I

Albert Y Lo. On a class of bayesian nonparametric estimates: I. density estimates.The annals of statistics, pages 351–357, 1984

1984
[28]

Bayesian nonparametric modeling of dynamic pollution clusters through an autoregressive logistic-beta stirling-gamma process.arXiv preprint arXiv:2601.04625, 2026

Santiago Marin, Bronwyn Loong, and Anton H Westveld. Bayesian nonparametric modeling of dynamic pollution clusters through an autoregressive logistic-beta stirling-gamma process.arXiv preprint arXiv:2601.04625, 2026

arXiv 2026
[29]

Comparing clusterings—an information based distance.Journal of multivariate analysis, 98(5):873–895, 2007

Marina Meil˘ a. Comparing clusterings—an information based distance.Journal of multivariate analysis, 98(5):873–895, 2007

2007
[30]

J. Newig. Das konzept der kulturerdteile, 2014. URLhttp://www.kulturerdteile.de/ kulturerdteile/. Last access on 24.03.2026

2014
[31]

Dependent modeling of temporal sequences of random partitions.Journal of Computational and Graphical Statistics, 31(2):614– 627, 2022

Garritt L Page, Fernando A Quintana, and David B Dahl. Dependent modeling of temporal sequences of random partitions.Journal of Computational and Graphical Statistics, 31(2):614– 627, 2022

2022
[32]

Women in business media: A critical discourse analysis of representations of women in forbes, fortune and bloomberg businessweek, 2015–2017

Kate Power, Lucy Rak, and Marianne Kim. Women in business media: A critical discourse analysis of representations of women in forbes, fortune and bloomberg businessweek, 2015–2017. Critical Approaches to Discourse Analysis Across Disciplines, 11(2):1–26, 2019

2015
[33]

Gender bias in the news: A scalable topic modelling and visualization framework.Frontiers in artificial intelligence, 4:664737, 2021

Prashanth Rao and Maite Taboada. Gender bias in the news: A scalable topic modelling and visualization framework.Frontiers in artificial intelligence, 4:664737, 2021

2021
[34]

i can’t just pull a woman out of a hat

Andreas A Riedl, Tobias Rohrbach, and Christina Krakovsky. “i can’t just pull a woman out of a hat”: A mixed-methods study on journalistic drivers of women’s representation in political news.Journalism & Mass Communication Quarterly, 101(3):679–702, 2024

2024
[35]

A stochastic approximation method.The annals of math- ematical statistics, pages 400–407, 1951

Herbert Robbins and Sutton Monro. A stochastic approximation method.The annals of math- ematical statistics, pages 400–407, 1951

1951
[36]

Optimal scaling for various metropolis-hastings algorithms.Statistical science, 16(4):351–367, 2001

Gareth O Roberts and Jeffrey S Rosenthal. Optimal scaling for various metropolis-hastings algorithms.Statistical science, 16(4):351–367, 2001

2001
[37]

A constructive definition of dirichlet priors.Statistica sinica, pages 639– 650, 1994

Jayaram Sethuraman. A constructive definition of dirichlet priors.Statistica sinica, pages 639– 650, 1994

1994
[38]

A paper ceiling: Explaining the persistent underrepresentation of women in printed news.American Sociological Review, 80(5):960–984, 2015

Eran Shor, Arnout Van De Rijt, Alex Miltsov, Vivek Kulkarni, and Steven Skiena. A paper ceiling: Explaining the persistent underrepresentation of women in printed news.American Sociological Review, 80(5):960–984, 2015

2015
[39]

A large-scale test of gender bias in the media.Sociological science, 6:526–550, 2019

Eran Shor, Arnout Van De Rijt, and Babak Fotouhi. A large-scale test of gender bias in the media.Sociological science, 6:526–550, 2019

2019
[40]

Do women in the newsroom make a difference? coverage sentiment toward women and men as a function of newsroom composition.Sex Roles, 81(1):44–58, 2019

Eran Shor, Arnout Van de Rijt, and Alex Miltsov. Do women in the newsroom make a difference? coverage sentiment toward women and men as a function of newsroom composition.Sex Roles, 81(1):44–58, 2019

2019
[41]

A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables.Psychological methods, 11(1):54, 2006

Michael Smithson and Jay Verkuilen. A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables.Psychological methods, 11(1):54, 2006

2006
[42]

Springer Nature, 2025

Timo Spinde.Automated Detection of Media Bias: From the Conceptualization of Media Bias to its Computational Classification. Springer Nature, 2025

2025
[43]

Reported speech and gender in the news: Who is quoted, how are they quoted, and why it matters.Discourse & Communication, 19(1):93–113, 2025

Maite Taboada. Reported speech and gender in the news: Who is quoted, how are they quoted, and why it matters.Discourse & Communication, 19(1):93–113, 2025

2025
[44]

Gender novelty and personalized news coverage in australia and canada.International Political Science Review, 42(2):164–178, 2021

Linda Trimble, Jennifer Curtin, Angelia Wagner, Meagan Auer, VKG Woodman, and Bethan Owens. Gender novelty and personalized news coverage in australia and canada.International Political Science Review, 42(2):164–178, 2021

2021
[45]

Gender differences in political media coverage: A meta-analysis.Journal of Communication, 70(1):114–143, 2020

Daphne Joanna Van der Pas and Loes Aaldering. Gender differences in political media coverage: A meta-analysis.Journal of Communication, 70(1):114–143, 2020

2020
[46]

Bayesian cluster analysis: Point estimation and credible balls (with discussion).Bayesian Analysis, 13(2):559–626, 2018

Sara Wade and Zoubin Ghahramani. Bayesian cluster analysis: Point estimation and credible balls (with discussion).Bayesian Analysis, 13(2):559–626, 2018. 14

2018
[47]

Furthermore, we use the appendix to describe the data pre-processing in section 8.2

Appendix In section 8.1 we present the figures mentioned in section 5. Furthermore, we use the appendix to describe the data pre-processing in section 8.2. In sec- tion 8.3, we present the parameter selection and validation of the final model used to obtain the results in section 5. In section 8.4 we describe the re- sults obtained for the country×topic×y...

2019

[1] [1]

Particle markov chain monte carlo methods.Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(3):269–342, 2010

Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. Particle markov chain monte carlo methods.Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(3):269–342, 2010

2010

[2] [2]

Bayesian cluster analysis.Biometrika, 65(1):31–38, 1978

David A Binder. Bayesian cluster analysis.Biometrika, 65(1):31–38, 1978

1978

[3] [3]

Latent dirichlet allocation.Journal of machine Learning research, 3(Jan):993–1022, 2003

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation.Journal of machine Learning research, 3(Jan):993–1022, 2003

2003

[4] [4]

Measuring partisan media bias cross-nationally.Swiss Political Science Review, 27 (2):412–433, 2021

Laia Castro. Measuring partisan media bias cross-nationally.Swiss Political Science Review, 27 (2):412–433, 2021

2021

[5] [5]

Costa-juss` a

Marta R. Costa-juss` a. An analysis of gender bias studies in natural language processing.Nature Machine Intelligence, 1:495–496, 2019

2019

[6] [6]

Does gender matter in the news? detecting and examining gender bias in news articles

Jamell Dacon and Haochen Liu. Does gender matter in the news? detecting and examining gender bias in news articles. InCompanion Proceedings of the Web Conference 2021, pages 385–392, 2021

2021

[7] [7]

Bayesian nonparametric mixture modeling for temporal dynamics of gender stereotypes.The Annals of Applied Statistics, 17(3):2256–2278, 2023

Maria De Iorio, Stefano Favaro, Alessandra Guglielmi, and Lifeng Ye. Bayesian nonparametric mixture modeling for temporal dynamics of gender stereotypes.The Annals of Applied Statistics, 17(3):2256–2278, 2023

2023

[8] [8]

Gender hierarchies in reporting genocide: an analysis of the dehumanization of palestinian men in western media.Communication, Culture & Critique, 18(4):310–321, 2025

Noura El Masry, Zina Sawaf, Gretchen King, and Sami Baroudi. Gender hierarchies in reporting genocide: an analysis of the dehumanization of palestinian men in western media.Communication, Culture & Critique, 18(4):310–321, 2025

2025

[9] [9]

A bayesian analysis of some non-parametric problems.The Annals of Statistics, 1(2):353–355, 1873

Thomas S Ferguson. A bayesian analysis of some non-parametric problems.The Annals of Statistics, 1(2):353–355, 1873

[10] [10]

Israeli media coverage of international male and female politicians: Gender and ethnopolitical aspects.Communications, 48(2):226–248, 2023

Gilad Greenwald. Israeli media coverage of international male and female politicians: Gender and ethnopolitical aspects.Communications, 48(2):226–248, 2023

2023

[11] [11]

Stick-breaking autoregressive processes.Journal of economet- rics, 162(2):383–396, 2011

Jim E Griffin and Mark FJ Steel. Stick-breaking autoregressive processes.Journal of economet- rics, 162(2):383–396, 2011

2011

[12] [12]

Order-based dependent dirichlet processes.Journal of the American statistical Association, 101(473):179–194, 2006

Jim E Griffin and MF J Steel. Order-based dependent dirichlet processes.Journal of the American statistical Association, 101(473):179–194, 2006

2006

[13] [13]

A time dependent bayesian nonparametric model for air quality analysis.Computational Statistics & Data Analysis, 95:161–175, 2016

Luis Guti´ errez, Rams´ es H Mena, and Matteo Ruggiero. A time dependent bayesian nonparametric model for air quality analysis.Computational Statistics & Data Analysis, 95:161–175, 2016

2016

[14] [14]

A systematic review of spatio-temporal statistical models: Theory, structure, and applications.arXiv preprint arXiv:2511.00422, 2025

Isabella Habereder, Thomas Kneib, Isao Echizen, and Timo Spinde. A systematic review of spatio-temporal statistical models: Theory, structure, and applications.arXiv preprint arXiv:2511.00422, 2025

arXiv 2025

[15] [15]

Marc Hooghe, Laura Jacobs, and Ellen Claes. Enduring gender bias in reporting on political elite positions: Media coverage of female mps in belgian news broadcasts (2003–2011).The International Journal of Press/Politics, 20(4):395–414, 2015

2003

[16] [16]

The promises and pitfalls of llm annotations in dataset labeling: A case study on media bias detection

Tom´ aˇ s Horych, Christoph Mandl, Terry Ruas, Andr´ e Greiner-Petter, Bela Gipp, Akiko Aizawa, and Timo Spinde. The promises and pitfalls of llm annotations in dataset labeling: A case study on media bias detection. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 1370–1386, 2025

2025

[17] [17]

Women are seen more than heard in online newspapers.PloS one, 11(2):e0148434, 2016

Sen Jia, Thomas Lansdall-Welfare, Saatviga Sudhahar, Cynthia Carter, and Nello Cristianini. Women are seen more than heard in online newspapers.PloS one, 11(2):e0148434, 2016

2016

[18] [18]

Violators, virtuous, or victims? how global newspapers represent the female member of parliament.Feminist Media Studies, 20(5): 692–712, 2020

Devin K Joshi, Meseret F Hailu, and Lauren J Reising. Violators, virtuous, or victims? how global newspapers represent the female member of parliament.Feminist Media Studies, 20(5): 692–712, 2020

2020

[19] [19]

Fasttext

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, H´ erve J´ egou, and Tomas Mikolov. Fasttext. zip: Compressing text classification models.arXiv preprint arXiv:1612.03651, 2016

Pith/arXiv arXiv 2016

[20] [20]

Bag of tricks for efficient text classification

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tom´ aˇ s Mikolov. Bag of tricks for efficient text classification. InProceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers, pages 427–431, 2017

2017

[21] [21]

Die geographie und die kulturerdteile

Albert Kolb. Die geographie und die kulturerdteile. In A. Leidlmair, editor,Hermann von Wissmann-Festschrift, page 46. Geographisches Institut der Universit¨ at T¨ ubingen, 1962

1962

[22] [22]

Dirichlet process mixtures of beta distributions, with applications to density and intensity estimation

Athanasios Kottas. Dirichlet process mixtures of beta distributions, with applications to density and intensity estimation. InWorkshop on Learning with Nonparametric Bayesian Methods, 23rd International Conference on Machine Learning (ICML), volume 47, 2006

2006

[23] [23]

Athanasios Kottas and Bruno Sans´ o. Bayesian mixture modeling for spatial poisson process intensities, with applications to extreme value analysis.Journal of Statistical Planning and Inference, 137(10):3151–3163, 2007

2007

[24] [24]

Bayesian model-based clustering procedures.Journal of Com- putational and Graphical Statistics, 16(3):526–558, 2007

John W Lau and Peter J Green. Bayesian model-based clustering procedures.Journal of Com- putational and Graphical Statistics, 16(3):526–558, 2007

2007

[25] [25]

Logistic-beta processes for dependent random probabilities with beta marginals.Bayesian Analysis, 20(4):1345–1369, 13 2025

Changwoo J Lee, Alessandro Zito, Huiyan Sang, and David B Dunson. Logistic-beta processes for dependent random probabilities with beta marginals.Bayesian Analysis, 20(4):1345–1369, 13 2025

2025

[26] [26]

The specific visuality of women of the global south in the media of the global north

Sohyun Lee. The specific visuality of women of the global south in the media of the global north. Humanities and Social Sciences Communications, 11(1):1–10, 2024

2024

[27] [27]

On a class of bayesian nonparametric estimates: I

Albert Y Lo. On a class of bayesian nonparametric estimates: I. density estimates.The annals of statistics, pages 351–357, 1984

1984

[28] [28]

Bayesian nonparametric modeling of dynamic pollution clusters through an autoregressive logistic-beta stirling-gamma process.arXiv preprint arXiv:2601.04625, 2026

Santiago Marin, Bronwyn Loong, and Anton H Westveld. Bayesian nonparametric modeling of dynamic pollution clusters through an autoregressive logistic-beta stirling-gamma process.arXiv preprint arXiv:2601.04625, 2026

arXiv 2026

[29] [29]

Comparing clusterings—an information based distance.Journal of multivariate analysis, 98(5):873–895, 2007

Marina Meil˘ a. Comparing clusterings—an information based distance.Journal of multivariate analysis, 98(5):873–895, 2007

2007

[30] [30]

J. Newig. Das konzept der kulturerdteile, 2014. URLhttp://www.kulturerdteile.de/ kulturerdteile/. Last access on 24.03.2026

2014

[31] [31]

Dependent modeling of temporal sequences of random partitions.Journal of Computational and Graphical Statistics, 31(2):614– 627, 2022

Garritt L Page, Fernando A Quintana, and David B Dahl. Dependent modeling of temporal sequences of random partitions.Journal of Computational and Graphical Statistics, 31(2):614– 627, 2022

2022

[32] [32]

Women in business media: A critical discourse analysis of representations of women in forbes, fortune and bloomberg businessweek, 2015–2017

Kate Power, Lucy Rak, and Marianne Kim. Women in business media: A critical discourse analysis of representations of women in forbes, fortune and bloomberg businessweek, 2015–2017. Critical Approaches to Discourse Analysis Across Disciplines, 11(2):1–26, 2019

2015

[33] [33]

Gender bias in the news: A scalable topic modelling and visualization framework.Frontiers in artificial intelligence, 4:664737, 2021

Prashanth Rao and Maite Taboada. Gender bias in the news: A scalable topic modelling and visualization framework.Frontiers in artificial intelligence, 4:664737, 2021

2021

[34] [34]

i can’t just pull a woman out of a hat

Andreas A Riedl, Tobias Rohrbach, and Christina Krakovsky. “i can’t just pull a woman out of a hat”: A mixed-methods study on journalistic drivers of women’s representation in political news.Journalism & Mass Communication Quarterly, 101(3):679–702, 2024

2024

[35] [35]

A stochastic approximation method.The annals of math- ematical statistics, pages 400–407, 1951

Herbert Robbins and Sutton Monro. A stochastic approximation method.The annals of math- ematical statistics, pages 400–407, 1951

1951

[36] [36]

Optimal scaling for various metropolis-hastings algorithms.Statistical science, 16(4):351–367, 2001

Gareth O Roberts and Jeffrey S Rosenthal. Optimal scaling for various metropolis-hastings algorithms.Statistical science, 16(4):351–367, 2001

2001

[37] [37]

A constructive definition of dirichlet priors.Statistica sinica, pages 639– 650, 1994

Jayaram Sethuraman. A constructive definition of dirichlet priors.Statistica sinica, pages 639– 650, 1994

1994

[38] [38]

A paper ceiling: Explaining the persistent underrepresentation of women in printed news.American Sociological Review, 80(5):960–984, 2015

Eran Shor, Arnout Van De Rijt, Alex Miltsov, Vivek Kulkarni, and Steven Skiena. A paper ceiling: Explaining the persistent underrepresentation of women in printed news.American Sociological Review, 80(5):960–984, 2015

2015

[39] [39]

A large-scale test of gender bias in the media.Sociological science, 6:526–550, 2019

Eran Shor, Arnout Van De Rijt, and Babak Fotouhi. A large-scale test of gender bias in the media.Sociological science, 6:526–550, 2019

2019

[40] [40]

Do women in the newsroom make a difference? coverage sentiment toward women and men as a function of newsroom composition.Sex Roles, 81(1):44–58, 2019

Eran Shor, Arnout Van de Rijt, and Alex Miltsov. Do women in the newsroom make a difference? coverage sentiment toward women and men as a function of newsroom composition.Sex Roles, 81(1):44–58, 2019

2019

[41] [41]

A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables.Psychological methods, 11(1):54, 2006

Michael Smithson and Jay Verkuilen. A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables.Psychological methods, 11(1):54, 2006

2006

[42] [42]

Springer Nature, 2025

Timo Spinde.Automated Detection of Media Bias: From the Conceptualization of Media Bias to its Computational Classification. Springer Nature, 2025

2025

[43] [43]

Reported speech and gender in the news: Who is quoted, how are they quoted, and why it matters.Discourse & Communication, 19(1):93–113, 2025

Maite Taboada. Reported speech and gender in the news: Who is quoted, how are they quoted, and why it matters.Discourse & Communication, 19(1):93–113, 2025

2025

[44] [44]

Gender novelty and personalized news coverage in australia and canada.International Political Science Review, 42(2):164–178, 2021

Linda Trimble, Jennifer Curtin, Angelia Wagner, Meagan Auer, VKG Woodman, and Bethan Owens. Gender novelty and personalized news coverage in australia and canada.International Political Science Review, 42(2):164–178, 2021

2021

[45] [45]

Gender differences in political media coverage: A meta-analysis.Journal of Communication, 70(1):114–143, 2020

Daphne Joanna Van der Pas and Loes Aaldering. Gender differences in political media coverage: A meta-analysis.Journal of Communication, 70(1):114–143, 2020

2020

[46] [46]

Bayesian cluster analysis: Point estimation and credible balls (with discussion).Bayesian Analysis, 13(2):559–626, 2018

Sara Wade and Zoubin Ghahramani. Bayesian cluster analysis: Point estimation and credible balls (with discussion).Bayesian Analysis, 13(2):559–626, 2018. 14

2018

[47] [47]

Furthermore, we use the appendix to describe the data pre-processing in section 8.2

Appendix In section 8.1 we present the figures mentioned in section 5. Furthermore, we use the appendix to describe the data pre-processing in section 8.2. In sec- tion 8.3, we present the parameter selection and validation of the final model used to obtain the results in section 5. In section 8.4 we describe the re- sults obtained for the country×topic×y...

2019