pith. machine review for the scientific record. sign in

arxiv: 2605.03962 · v1 · submitted 2026-05-05 · 💻 cs.SI

Recognition: unknown

Demographic Divides in Political Content Exposure on Facebook

Authors on Pith no claims yet

Pith reviewed 2026-05-07 12:29 UTC · model grok-4.3

classification 💻 cs.SI
keywords Facebookpolitical content exposureinformation dietdemographic disparitieslongitudinal dataplatform interventionscivic information
0
0 comments X

The pith

Political content forms only 18% of Facebook users' potential information diets, with large persistent differences across age, gender, and race.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a decade-long dataset from the full lists of public pages and groups followed by more than 1,100 American users and analyzes hundreds of millions of posts between 2012 and 2023 to map what information those users could have seen. It finds that political material makes up a modest 18% share while lifestyle and entertainment topics dominate the rest, yet this average hides steady gaps: different demographic groups see different volumes and different ideological leans of political posts. The study also shows that political discussion often leaks into non-political pages and that a 2018 platform change raised the political share by shrinking visibility of everything else. A reader would care because the work gives a platform-independent, long-term view of exposure rather than just clicks or shares, which matters for judging how Facebook shapes civic life.

Core claim

By collecting the complete lists of public pages and groups followed by over 1,100 users and examining the posts those accounts produced from 2012 to 2023, the authors establish that political content constitutes 18% of a user's potential information diet, which is otherwise dominated by lifestyle and entertainment topics, while revealing significant and stable disparities in both the amount and the ideological direction of political content across age, gender, and racial groups, along with political content appearing inside non-political categories and a sharp rise in the political share after the 2018 Meaningful Social Interactions update.

What carries the argument

The longitudinal dataset built from each user's full list of followed public pages and groups, used as a proxy for potential information exposure across hundreds of millions of posts.

If this is right

  • Assessments of Facebook's role in civic life must measure exposure directly rather than rely on engagement metrics alone.
  • Demographic groups experience substantially different volumes and ideological leans of political content over long periods.
  • Platform ranking changes can quickly increase or decrease the political portion of users' information diets.
  • Political discourse circulates inside lifestyle and entertainment spaces, so boundaries between categories are porous.
  • Longitudinal, user-follow-based data can reveal patterns that short-term or aggregate studies miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed demographic gaps may help explain differences in political knowledge or attitudes across groups.
  • If the public-follows measure is valid, then private groups and direct messages could add even more variation to actual exposure.
  • The same followed-pages method could be applied to other platforms to test whether similar modest shares and divides appear elsewhere.
  • Efforts to reduce political polarization on social media may need to address content in non-political spaces as well.

Load-bearing premise

The lists of public pages and groups followed by the users accurately represent their overall potential information environment without major bias from missing private groups or unlisted content.

What would settle it

A comparison using the same users' actual post impressions or engagement data that shows a political share far from 18% would indicate the followed-pages proxy does not capture real exposure.

Figures

Figures reproduced from arXiv: 2605.03962 by Joao Couto, Kiran Garimella, S M Mehedi Zaman.

Figure 1
Figure 1. Figure 1: Fraction of content from various sources shown on user’s feeds on Facebook. view at source ↗
Figure 2
Figure 2. Figure 2: Fraction of users by ethnicity and gender. view at source ↗
Figure 3
Figure 3. Figure 3: Fraction of users by ethnicity and age. 4 Data Processing In this section, we describe the various pre-processing steps taken before our analysis. We first annotated a subset of the groups for high precision political content. A significant limitation of much of the existing research on online content exposure is its reliance on high-level, categorical labels. Studies often classify an entire page, group, … view at source ↗
Figure 4
Figure 4. Figure 4: Exposure to political content by: (a) ethnicity, (b) age group, (c) gender, (d) gender and ethnicity. view at source ↗
Figure 5
Figure 5. Figure 5: Partisan content exposure leaning by: (a) ethnicity, (b) age group, (c) gender view at source ↗
Figure 6
Figure 6. Figure 6: Relative effect of the three interventions on the view at source ↗
Figure 7
Figure 7. Figure 7: Relative effect of the three interventions on the view at source ↗
Figure 8
Figure 8. Figure 8: Prevalence of political content per age bracket over 10 years. The black dotted vertical lines are in November of (2014, 2016, view at source ↗
Figure 9
Figure 9. Figure 9: Prevalence of political content per ethnicity over 10 years view at source ↗
Figure 10
Figure 10. Figure 10: Share of left and right leaning content by ethnicity view at source ↗
Figure 11
Figure 11. Figure 11: Share of left and right leaning content by age view at source ↗
Figure 12
Figure 12. Figure 12: Share of left and right leaning content by gender view at source ↗
Figure 13
Figure 13. Figure 13: Share of left leaning content by ethnicity over time view at source ↗
Figure 14
Figure 14. Figure 14: Share of left leaning content by age over time view at source ↗
Figure 15
Figure 15. Figure 15: Share of left leaning content by gender over time view at source ↗
Figure 16
Figure 16. Figure 16: Distribution of average partisan leaning across four page categories. view at source ↗
Figure 17
Figure 17. Figure 17: Content types by age group. Manuscript submitted to ACM view at source ↗
Figure 18
Figure 18. Figure 18: Content types by gender and ethnicity. 18-24 35-44 45-54 25-34 55-64 65+ 0.00 0.05 0.10 0.15 0.20 0.25 Fraction of News in medium category by age bracket 0.115 0.141 0.15 0.195 0.205 0.208 n=84 n=263 n=196 n=263 n=142 n=148 view at source ↗
Figure 19
Figure 19. Figure 19: News exposure by age group. A.6 Heterogeneity in content exposure across groups This section digs into the varied modalities through which different demographic groups are exposed to information, highlighting significant differences in content preferences that are not only relevant but also consequential for studies related to information dissemination and misinformation. We observed distinct patterns in … view at source ↗
read the original abstract

Despite Facebook's central role in American civic life, a clear, evidence-based understanding of users' long-term information environments has remained elusive, hindering assessments of the platform's societal impact. This study addresses that gap by analyzing a unique decade-long dataset, constructed by collecting the full list of public pages and groups followed by over 1,100 American users. This approach allows us to examine the potential information exposure of these users by analyzing hundreds of millions of posts from 2012 to 2023. We find that political content constitutes a modest 18% of a user's potential information diet, which is predominantly composed of lifestyle and entertainment topics. This aggregate view, however, masks a deeply stratified reality: we uncover significant and persistent disparities in the volume and ideological leaning of political content across age, gender, and racial lines. Furthermore, we quantify the porous boundaries between content categories, showing how political discourse frequently permeates non-political spaces. Leveraging the dataset's longitudinal nature, we also assess the impact of major platform interventions. We find that Meta's 2018 "Meaningful Social Interactions" update dramatically increased the share of political content by contracting the visibility of non-political posts. By providing a granular, decade-long map of potential information exposure, our study offers one of the first representative and longitudinal picture drawn from platform-independent data. Our findings underscore the critical need for researchers to measure exposure, not merely engagement, and to account for the significant volume of political content that circulates in non-political spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper analyzes a decade-long corpus of posts (2012–2023) from public pages and groups followed by 1,100 U.S. Facebook users to characterize potential information diets. It reports that political content comprises 18% of this diet (with the remainder dominated by lifestyle and entertainment), documents persistent demographic disparities in political volume and ideological slant by age, gender, and race, shows political discourse leaking into non-political categories, and finds that Meta’s 2018 Meaningful Social Interactions update increased the political share by reducing non-political visibility.

Significance. If the measurement pipeline is sound, the work supplies one of the few long-horizon, platform-independent maps of potential exposure rather than engagement. The longitudinal span, the quantification of cross-category leakage, and the before/after assessment of a major platform change are genuine strengths that could inform both academic and policy discussions on information environments.

major comments (3)
  1. [§3] §3 (Data and Methods): the manuscript provides no description of the classifier or labeling procedure used to designate posts as political versus lifestyle/entertainment. Without precision, recall, or inter-annotator details for this step, the headline 18% figure and all subsequent demographic contrasts rest on an opaque measurement whose error structure is unknown.
  2. [§3 and §5] §3 and §5: the central claim that the collected public-page/group list constitutes a representative proxy for each user’s “potential information diet” is asserted without external validation or sensitivity checks. The paper does not quantify the share of actual exposure that occurs via private groups, friend reshares, or algorithmic recommendations outside the followed set; if political material is over-represented in those channels, both the aggregate 18% and the reported age/gender/race gaps could be systematically misestimated.
  3. [§4.2] §4.2 (Demographic stratification): the text does not report how self-reported or inferred demographic variables were coded, cleaned, or controlled for confounding (e.g., differential page-following rates by age). The absence of these operational details makes it impossible to assess whether the observed disparities are robust to alternative codings or selection corrections.
minor comments (2)
  1. [Abstract] Abstract: the phrase “one of the first representative and longitudinal picture” is grammatically awkward; consider rephrasing for clarity.
  2. [Discussion] The paper would benefit from an explicit comparison table placing the 18% political share against prior estimates from engagement-based or survey-based studies.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: §3 (Data and Methods): the manuscript provides no description of the classifier or labeling procedure used to designate posts as political versus lifestyle/entertainment. Without precision, recall, or inter-annotator details for this step, the headline 18% figure and all subsequent demographic contrasts rest on an opaque measurement whose error structure is unknown.

    Authors: We agree that the current manuscript lacks sufficient detail on the classification procedure. In the revised version we will expand Section 3 to describe the classifier, the labeling protocol, precision and recall metrics, and inter-annotator agreement statistics so that the reliability of the 18% figure and downstream contrasts can be properly evaluated. revision: yes

  2. Referee: §3 and §5: the central claim that the collected public-page/group list constitutes a representative proxy for each user’s “potential information diet” is asserted without external validation or sensitivity checks. The paper does not quantify the share of actual exposure that occurs via private groups, friend reshares, or algorithmic recommendations outside the followed set; if political material is over-represented in those channels, both the aggregate 18% and the reported age/gender/race gaps could be systematically misestimated.

    Authors: Our data consist exclusively of posts from the public pages and groups followed by the panelists, which we treat as a proxy for potential exposure through those channels. We cannot quantify the share of exposure occurring via private groups, friend reshares, or recommendations outside the followed set because those data are not available to us. In the revision we will add an explicit discussion of this scope limitation together with any sensitivity checks that can be performed with the existing data. revision: partial

  3. Referee: §4.2 (Demographic stratification): the text does not report how self-reported or inferred demographic variables were coded, cleaned, or controlled for confounding (e.g., differential page-following rates by age). The absence of these operational details makes it impossible to assess whether the observed disparities are robust to alternative codings or selection corrections.

    Authors: We will revise Section 4.2 to document the coding and cleaning procedures for all demographic variables and will add controls for potential confounders such as differential page-following rates by age. Robustness checks under alternative codings and specifications will also be reported. revision: yes

standing simulated objections not resolved
  • Quantifying the share of actual exposure from private groups, friend reshares, or algorithmic recommendations outside the followed set, because the dataset contains only public followed pages and groups.

Circularity Check

0 steps flagged

No circularity: purely observational data analysis

full rationale

The paper performs direct empirical computation on a collected dataset of followed public pages/groups and their posts. Quantities such as the 18% political share and demographic disparities are simple aggregates and stratifications of the observed post corpus; no equations, fitted parameters, or predictions are defined in terms of themselves. No self-citation chains, ansatzes, or uniqueness theorems are invoked to derive the central results. The study is self-contained against its own data collection protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on empirical data collection and classification rather than formal axioms or derivations; the main unstated premises concern sample representativeness and content labeling.

axioms (2)
  • domain assumption The 1,100-user sample is sufficiently representative of American Facebook users for generalizing demographic patterns.
    Invoked when the abstract presents findings as applying to users broadly.
  • domain assumption Posts from followed public pages and groups constitute the relevant potential information diet.
    Core to the exposure measurement described in the abstract.

pith-pipeline@v0.9.0 · 5572 in / 1397 out tokens · 42732 ms · 2026-05-07T12:29:37.015584+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 5 canonical work pages

  1. [1]

    Jennifer Allen, Baird Howland, Markus Mobius, David Rothschild, and Duncan J Watts. 2020. Evaluating the fake news problem at the scale of the information ecosystem.Science advances(2020)

  2. [2]

    Lisa P Argyle, Ethan C Busby, Joshua R Gubler, Christopher Rytting, and David Wingate. 2023. Out of one, many: Using language models to simulate human samples.Political Analysis(2023)

  3. [3]

    Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook.Science348, 6239 (2015), 1130–1132

  4. [4]

    Pablo Barberá. 2020. Social media, echo chambers, and political polarization.Social media and democracy: The state of the field, prospects for reform (2020), 34–55. Manuscript submitted to ACM 16 Zaman et al

  5. [5]

    Kay H Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L Scott. 2015. Inferring causal impact using Bayesian structural time-series models.Annals of Applied Statistics(2015)

  6. [6]

    Benjamin Burroughs. 2014. Facebook and FarmVille: A digital ritual analysis of social gaming.Games and Culture(2014)

  7. [7]

    2024.Changing partisan coalitions in a politically divided nation

    Pew Research Center. 2024.Changing partisan coalitions in a politically divided nation. Pew Research Center

  8. [8]

    Pew Research Center. 2024. The political values of Harris and Trump supporters. https://www.pewresearch.org/politics/2024/08/26/the-political- values-of-harris-and-trump-supporters/. August 26, 2024

  9. [9]

    Matthew Costello, James Hawdon, Thomas Ratliff, and Tyler Grantham. 2016. Who views online extremism? Individual attributes leading to exposure.Computers in human behavior63 (2016), 311–320

  10. [10]

    Daniel DellaPosta, Yongren Shi, and Michael Macy. 2015. Why do liberals drink lattes?Amer. J. Sociology120, 5 (2015), 1473–1511

  11. [11]

    Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Boyang Li, Shafiq Joty, and Lidong Bing. 2023. Is GPT-3 a Good Data Annotator?. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toro...

  12. [12]

    Gregory Eady et al. 2023. Exposure to the Russian Internet Research Agency foreign influence campaign on Twitter in the 2016 US election and its relationship to attitudes and voting behavior.Nature Communications(2023)

  13. [13]

    Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, and online news consumption.Public opinion quarterly80, S1 (2016), 298–320

  14. [14]

    Emma Fraxanet, Andreas Kaltenbrunner, Fabrizio Germano, and Vicenç Gómez. 2025. Analyzing news engagement on Facebook: tracking ideological segregation and news quality in the Facebook URL dataset.EPJ Data Science14, 1 (2025), 73

  15. [15]

    Deen Freelon. 2018. Computational research in the post-API age.Political Communication35, 4 (2018), 665–668

  16. [16]

    Suyash Fulay, William Brannon, Shrestha Mohanty, Cassandra Overney, Elinor Poole-Dayan, Deb Roy, and Jad Kabbara. 2024. On the Relationship between Truth and Political Bias in Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for ...

  17. [17]

    Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. 2018. Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship. InProceedings of the 2018 world wide web conference. 913–922

  18. [18]

    Jeffrey Gottfried. 2023. Americans’ Social Media Use — pewresearch.org. https://www.pewresearch.org/internet/2024/01/31/americans-social- media-use/. [Accessed 07 May 2024]

  19. [19]

    Andrew Guess, Jonathan Nagler, and Joshua Tucker. 2019. Less than you think: Prevalence and predictors of fake news dissemination on Facebook. Science advances5, 1 (2019), eaau4586

  20. [20]

    Andrew M Guess, Neil Malhotra, Jennifer Pan, Pablo Barberá, Hunt Allcott, Taylor Brown, Adriana Crespo-Tenorio, Drew Dimmery, Deen Freelon, Matthew Gentzkow, et al. 2023. How do social media feed algorithms affect attitudes and behavior in an election campaign?Science381, 6656 (2023), 398–404

  21. [21]

    Jonathan Heawood. 2018. Pseudo-public political speech: Democratic implications of the Cambridge Analytica scandal.Information polity23, 4 (2018), 429–434

  22. [22]

    Sara B Hobolt, Katharina Lawall, and James Tilley. 2024. The polarizing effect of partisan echo chambers.American Political Science Review118, 3 (2024), 1464–1479

  23. [23]

    Homa Hosseinmardi, Amir Ghasemian, Miguel Rivera-Lanas, Manoel Horta Ribeiro, Robert West, and Duncan J Watts. 2024. Causally estimating the effect of YouTube’s recommender system using counterfactual bots.PNAS(2024)

  24. [24]

    Florian Keusch, Paulina K Pankowska, Alexandru Cernat, and Ruben L Bach. 2023. Do you have two minutes to talk about your data? Willingness to participate and nonparticipation bias in Facebook data donation.Field Methods(2023)

  25. [25]

    Mike Isaac Kevin Roose. 2021. Facebook Dials Down the Politics for Users (Published 2021) — nytimes.com. https://www.nytimes.com/2021/02/10/ technology/facebook-reduces-politics-feeds.html. [Accessed 06-10-2025]

  26. [26]

    David Lazer, Mauricio Santillana, Roy H Perlis, Alexi Quintana, Katherine Ognyanova, Jonathan Green, Matthew A Baum, Matthew Simonson, Ata A Uslu, Hanyu Chwe, et al. 2020. The COVID States Project: A 50-state COVID-19 survey report# 26: Trajectory of COVID-19-related behaviors. COVID States Project(2020)

  27. [27]

    Meta. 2024. Llama-3.3-70B-Instruct. https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct. Release date: December 6, 2024

  28. [28]

    Meta Platforms

    Inc. Meta Platforms. 2024. Meta Transparency Center. https://transparency.meta.com. Accessed: 2025-09-15

  29. [29]

    Adam Mosseri. 2016. Building a Better News Feed for You. https://about.fb.com/news/2016/06/building-a-better-news-feed-for-you/. Facebook Newsroom

  30. [30]

    Facebook

    Adam Mosseri and Inc. Facebook. 2018. News Feed FYI: Bringing People Closer Together. https://about.fb.com/news/2018/01/news-feed-fyi- bringing-people-closer-together/. Facebook’s announcement of their 2018 News Feed algorithm change prioritizing meaningful social interactions (MSI)

  31. [31]

    Jakob Ohme, Theo Araujo, Laura Boeschoten, Deen Freelon, Byron B Reeves, and Thomas N Robinson. 2023. Digital trace data collection for social media effects research: APIs, data donation, and (screen) tracking.Communication Methods and Measures(2023)

  32. [32]

    OpenAI. 2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/. Manuscript submitted to ACM Demographic Divides in Political Content Exposure on Facebook 17

  33. [33]

    Pew Research Center. 2022. Politics on Twitter: One-Third of Tweets From U.S. Adults Are Political. https://www.pewresearch.org/politics/2022/06/ 16/politics-on-twitter-one-third-of-tweets-from-u-s-adults-are-political/

  34. [34]

    Yair Rubinstein. 2025. Meta Content Library as a Research Tool. InAdjunct Proceedings of the 36th ACM Conference on Hypertext and Social Media (HT Adjunct ’25). Association for Computing Machinery, New York, NY, USA, 54. doi:10.1145/3720533.3756893

  35. [35]

    Tal Sarig, Tal Galili, and Roee Eilat. 2023. balance–a Python package for balancing biased data samples.arXiv preprint arXiv:2307.06024(2023)

  36. [36]

    Mubashir Sultan, Alan N Tump, Nina Ehmann, Philipp Lorenz-Spreen, Ralph Hertwig, Anton Gollwitzer, and Ralf HJM Kurvers. 2024. Susceptibility to online misinformation: A systematic meta-analysis of demographic and psychological factors.Proceedings of the National Academy of Sciences 121, 47 (2024), e2409329121

  37. [37]

    Szymon Talaga, Erin Wertz, Dominik Batorski, and Magdalena Wojcieszak. 2025. Changes to the Facebook Algorithm Decreased News Visibility Between 2021-2024.arXiv preprint arXiv:2507.19373(2025)

  38. [38]

    Tess. 2018. About Us | CrowdTangle Help Center — help.crowdtangle.com. https://help.crowdtangle.com/en/articles/4201940-about-us. [Accessed 10 May 2024]

  39. [39]

    Tess. 2024. What data is CrowdTangle tracking? | CrowdTangle Help Center — help.crowdtangle.com. https://help.crowdtangle.com/en/articles/ 1140930-what-data-is-crowdtangle-tracking. [Accessed 15 May 2024]

  40. [40]

    meaningful social interactions

    Brian E Weeks, Daniel S Lane, Dam Hee Kim, Slgi S Lee, and Nojin Kwak. 2017. Incidental exposure, selective exposure, and political information sharing: Integrating online exposure patterns and expression on social media.Journal of computer-mediated communication22, 6 (2017), 363–379. A Appendix A.1 Proportion of content from groups and pages on Facebook ...