Recognition: unknown
MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media
Pith reviewed 2026-05-09 22:25 UTC · model grok-4.3
The pith
Co-occurrence networks reveal distinct reporting preferences across Indian news sources on the farmers' protests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that source-specific entity co-occurrence networks around the Farmers Protests exhibit significant differences in centrality, community structure, and link predictability across the four outlets, indicating varied reporting preferences, together with a consistent under-representation of farmer leaders.
What carries the argument
Entity co-occurrence networks in which shared mentions form links between entities, examined through centrality, community structure, and a link predictability measure that tracks consistency of associations over time.
If this is right
- Different outlets prioritize distinct sets of entities and their associations when covering identical events.
- Farmer leaders receive lower prominence in the networks constructed from every source examined.
- The link predictability metric can quantify stability in how media sources link entities across separate time periods.
- Relational patterns alone allow comparison of reporting behavior without reliance on textual labels or sentiment scores.
Where Pith is reading between the lines
- The same network construction could be used on coverage of other political events to identify recurring patterns of emphasis.
- If the observed differences in entity prominence align with audience reach or policy outcomes, they may help explain variations in public awareness of the protests.
- Combining the networks with basic checks on article length or placement could test whether structural signals match surface-level reporting volume.
Load-bearing premise
That the co-occurrence of entities in articles directly reflects the reporting preferences and potential biases of the media outlets without needing additional validation against ground truth or textual context.
What would settle it
A manual content analysis of the same articles that finds similar levels of entity emphasis across sources despite the computed differences in network centrality and predictability would challenge the results.
Figures
read the original abstract
We present MediaGraph, a network-theoretic framework for analyzing reporting preferences in news media through entity co-occurrence networks. Using articles from four Indian news-sources, two mainstream (The Times of India and The Indian Express) and two fringe outlets (dna and firstpost), we construct source-specific co-occurrence networks around the 2020-21 and 2024 Farmers Protests. We analyze these networks along three network theoretic axes of centrality, community structure, and co-occurrence link predictability. The link predictability metric is a novel metric proposed that quantifies the consistency of entity associations over time using a GraphSAGE-based model. Our results reveal significant differences in reporting preferences across sources for the same event, and a consistent under-representation of farmer leaders across sources. By shifting the focus from textual signals to relational structures, our approach offers a scalable, label-independent perspective on media analysis and introduces link predictability as a complementary measure of reporting behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MediaGraph, a network-theoretic framework that builds source-specific entity co-occurrence networks from articles in four Indian news outlets (two mainstream: The Times of India and The Indian Express; two fringe: dna and firstpost) covering the 2020-21 and 2024 Farmers' Protests. It evaluates these networks along three axes—centrality, community structure, and a novel link-predictability metric implemented via GraphSAGE—to quantify differences in reporting preferences and consistency of entity associations over time. The central claims are that the networks reveal significant source-specific differences in reporting and a consistent under-representation of farmer leaders across all outlets.
Significance. If the mapping from co-occurrence structure to editorial preference can be validated, the framework offers a scalable, label-free complement to text-based media-bias methods and introduces a temporal consistency metric that could be useful in computational social science. The work is technically straightforward and leverages standard network tools, but its interpretive claims rest on an untested proxy assumption.
major comments (3)
- [§3 (Link Predictability)] §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.
- [§4 (Centrality and Under-representation)] §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.
- [§2 (Data Construction)] §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.
minor comments (2)
- [Abstract] The abstract states 'significant differences' without naming the statistical tests or effect-size thresholds used; these should be stated explicitly.
- [Figures] Network figures would benefit from consistent node labeling conventions and legends that distinguish farmer leaders from other entity types.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important methodological considerations regarding our assumptions and metrics. We address each point below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.
Authors: We agree this is a valid concern and that the current setup risks conflating reconstruction with predictive consistency. We will revise Section 3 to implement an explicit temporal train/test split: the GraphSAGE model will be trained on the 2020-21 co-occurrence networks and evaluated on its ability to predict links in the 2024 networks for each source. We will also add a simple external baseline (e.g., preferential attachment or random prediction) for comparison. This change will be reflected in updated results and methodology. revision: yes
-
Referee: §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.
Authors: We acknowledge that the interpretation relies on a proxy and lacks direct ground-truth validation, which limits causal claims about editorial choice versus event facts. We cannot add comprehensive external lists or full annotations without new data collection. In revision, we will add an explicit limitations paragraph in Section 4 discussing this proxy nature, emphasizing that the under-representation finding is relative (consistent low centrality across all four sources) and that source-specific differences in other entities still support reporting preference claims. We will also outline future validation steps. revision: partial
-
Referee: §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.
Authors: This assumption underpins the framework and merits explicit testing. We will revise Section 2 to include a validation subsection: a random sample of 100 articles per source will be manually inspected to confirm that high co-occurrence pairs reflect substantive joint coverage rather than incidental mentions. We will also add discussion of alternative explanations (e.g., event-driven coverage) and how cross-source comparisons help isolate preferences. Updated text and any revised figures will be included. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper builds entity co-occurrence networks directly from article data for four sources and applies standard network measures (centrality, community structure) plus a proposed temporal link predictability metric via GraphSAGE. No equations or sections reduce any claimed result to its inputs by construction, nor do any load-bearing steps rely on self-citations that themselves assume the target outcome. The link predictability is framed as a consistency measure over time rather than a fitted parameter renamed as a prediction on the identical data; the central claims about reporting differences and under-representation follow from independent computation on the constructed graphs. The derivation remains self-contained against the input co-occurrence data without tautological collapse.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Agung Farid Agustian. 2025. Analyzing Media Bias in Support of Government Policies: A Critical Discourse Analysis in a Newspaper.PAROLE: Journal of Linguistics and Education15, 1 (2025), 38–46
2025
- [2]
-
[3]
David P. Baron. 2006. Persistent media bias.Journal of Public Economics90, 1 (2006), 1–36. doi:10.1016/j.jpubeco.2004.10.006
-
[4]
Hadjer Belghoul and Abdellah Baraka. 2025. Media Bias in Reporting the 2021 Sheikh Jarrah Evictions: A Van Dijkian Discourse Analysis.Majallat al-Nas12, 2 (2025), 592–608
2025
-
[5]
Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Eui- jong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution.The VLDB Journal18, 1 (2009), 255–276
2009
-
[6]
Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantify- ing media bias through crowdsourced content analysis.Public Opinion Quarterly 80, S1 (2016), 250–271
2016
-
[7]
Ceren Budak, Sharad Goel, and Justin M. Rao. 2016. Fair and Balanced? Quan- tifying Media Bias through Crowdsourced Content Analysis.Public Opinion Quarterly80, S1 (04 2016), 250–271. doi:10.1093/poq/nfw007
-
[8]
Yutong Chen, Gaurav Chiplunkar, Sheetal Sekhri, Anirban Sen, and Aaditeshwar Seth. 2025. How do political connections of firms matter during an economic crisis?Journal of Development Economics175 (2025), 103471. doi:10.1016/j.jd eveco.2025.103471
-
[9]
Tomas Cicchini, Sofia Morena Del Pozo, Enzo Tagliazucchi, and Pablo Balenzuela
-
[10]
News sharing on Twitter reveals emergent fragmentation of media agenda and persistent polarization.EPJ Data Science11, 1 (2022), 48
2022
-
[11]
Omar Daoudi, Jason Gainous, Syed Ali Hussain, and Khaled Zamoum. 2026. Media bias in sports journalism: A comparative study of Qatar 2022 World Cup coverage.Communication & Sport14, 1 (2026), 207–230
2026
-
[12]
Simeon Djankov, Caralee McLiesh, Tatiana Nenova, and Andrei Shleifer. 2003. Who owns the media?The Journal of Law and Economics46, 2 (2003), 341–382
2003
-
[13]
BV Elasticsearch. 2018. Elasticsearch.software], version6, 1 (2018)
2018
-
[14]
James Flamino, Alessandro Galeazzi, Stuart Feldman, Michael W Macy, Brendan Cross, Zhenkun Zhou, Matteo Serafino, Alexandre Bovet, Hernán A Makse, and Boleslaw K Szymanski. 2023. Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections.Nature Human Behaviour7, 6 (2023), 904–916
2023
-
[15]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653 [cs.SI] https://arxiv.org/abs/1607.00653
work page Pith review arXiv 2016
-
[16]
Felix Hamborg and Karsten Donnay. 2021. NewsMTSC: a dataset for (multi-) target-dependent sentiment classification in political news articles. InProceedings of the 16th Conference of the European Chapter of the Association for Computa- tional Linguistics: Main Volume. 1663–1675
2021
-
[17]
Inductive Representation Learning on Large Graphs
William L. Hamilton, Rex Ying, and Jure Leskovec. 2018. Inductive Representa- tion Learning on Large Graphs. arXiv:1706.02216 [cs.SI] https://arxiv.org/abs/17 06.02216
work page Pith review arXiv 2018
-
[18]
Ignatius Haryanto. 2011. Media ownership and its implications for journalists and journalism in Indonesia.Politics and the media in twenty-first century Indonesia: Decade of democracy(2011), 104–118
2011
-
[19]
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd
-
[20]
spaCy: Industrial-strength Natural Language Processing in Python. (2020). doi:10.5281/zenodo.1212303
-
[21]
Tomáš Horych, Christoph Mandl, Terry Ruas, André Greiner-Petter, Bela Gipp, Akiko Aizawa, and Timo Spinde. 2025. The promises and pitfalls of LLM annotations in dataset labeling: A case study on media bias detection. InFindings of the Association for Computational Linguistics: NAACL 2025. 1370–1386
2025
-
[22]
Homa Hosseinmardi, Samuel Wolken, David M Rothschild, and Duncan J Watts
-
[23]
Unpacking media bias in the growing divide between cable and network news.Scientific Reports15, 1 (2025), 17607
2025
-
[24]
Byunghwee Lee, Hyo-sun Ryu, Jae Kook Lee, Hawoong Jeong, and Beom Jun Kim. 2025. Network analysis reveals news press landscape and asymmetric user polarization.Physica A: Statistical Mechanics and its Applications(2025), 130842
2025
-
[25]
Sibo Liu, Alexey Makarin, Jinfeng Wu, and Dong Zhang. 2026. The War of Ideas: Institutions and Global Media Bias. (2026)
2026
-
[26]
Nicholas Kah Yean Low and Andrew Melatos. 2022. Discerning media bias within a network of political allies and opponents: The idealized example of a biased coin.Physica A: Statistical Mechanics and its Applications590 (2022), 126722. doi:10.1016/j.physa.2021.126722
-
[27]
Maxwell E McCombs and Donald L Shaw. 1972. The agenda-setting function of mass media.Public opinion quarterly36, 2 (1972), 176–187
1972
-
[28]
Horst Po¨ ttker. 2003. News and its communicative quality: the inverted pyramid- when and why did it appear?Journalism Studies4, 4 (2003), 501–511
2003
-
[29]
Elad Segev. 2020. Textual network analysis: Detecting pre- vailing themes and biases in international news and so- cial media.Sociology Compass14, 4 (2020), e12779. arXiv:https://compass.onlinelibrary.wiley.com/doi/pdf/10.1111/soc4.12779 doi:10.1111/soc4.12779
-
[30]
Anirban Sen, Debanjan Ghatak, Gurjeet Khanuja, Kumari Rekha, Mehak Gupta, Sanket Dhakate, Kartikeya Sharma, and Aaditeshwar Seth. 2022. Analysis of media bias in policy discourse in india. InProceedings of the 5th ACM SIG- CAS/SIGCHI Conference on Computing and Sustainable Societies. 57–77
2022
-
[31]
Anirban Sen, Debanjan Ghatak, Kapil Kumar, Gurjeet Khanuja, Deepak Bansal, Mehak Gupta, Kumari Rekha, Saloni Bhogale, Priyamvada Trivedi, and Aaditesh- war Seth. 2019. Studying the discourse on economic policies in India using mass media, social media, and the parliamentary question hour data. InProceedings of the 2nd ACM SIGCAS Conference on Computing an...
2019
-
[32]
Nidaa Shahid and Bilal Ghazanfar. 2025. Mapping Media Bias: Global Islam- ophobic Trends and their Reflections in South Asia.The Beacon Journal5, 02 (2025)
2025
-
[33]
Ankur Sharma, Navreet Kaur, Anirban Sen, and Aaditeshwar Seth. 2020. Ideology Detection in the Indian Mass Media. In2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 627–634. doi:10.1109/ASONAM49781.2020.9381344
-
[34]
Timo Spinde, Lada Rudnitckaia, Jelena Mitrovi´c, Felix Hamborg, Michael Gran- itzer, Bela Gipp, and Karsten Donnay. 2021. Automated identification of bias inducing words in news articles using linguistic and context-oriented features. Information Processing & Management58, 3 (2021), 102505
2021
-
[35]
Vincent A Traag, Ridho Reinanda, and Gerry van Klinken. 2016. Structure of a media co-occurrence network. InProceedings of ECCS 2014: European MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Conference on Complex Systems. Springer, 81–91
2016
-
[36]
Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities.Scientific reports9, 1 (2019), 5233
2019
-
[37]
Jenny S Wang, Samar Haider, Amir Tohidi, Anushkaa Gupta, Yuxuan Zhang, Chris Callison-Burch, David Rothschild, and Duncan J Watts. 2025. Media bias detector: Designing and implementing a tool for real-time selection and framing bias analysis in news coverage. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–27
2025
-
[38]
Ze Wang, Zekun Wu, Yichi Zhang, Xin Guan, Navya Jain, Qinyang Lu, Saloni Gupta, and Adriano Koshiyama. 2025. Bias amplification: Large language models as increasingly biased media. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia- Pacific Chapter of the Association for Computational...
2025
-
[39]
Wikipedia contributors. 2026. 2020–2021 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2020%E2%80%932021_Indian_farmers%27_protest Accessed: 2026-04-02
2026
-
[40]
Wikipedia contributors. 2026. 2024–2025 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2024%E2%80%942025_Indian_farmers%27_protest Accessed: 2026-04-02
2026
-
[41]
Bennett WL. 1990. Toward a theory of press-state relations in the United States. Journal of Communication40, 2 (1990), 103–125
1990
-
[42]
Dvir Yogev, Criminal Law, and Justice Center. [n. d.]. Measuring Media Bias Toward Reform Prosecutors: A Multi-Method NLP Analysis of Bay Area News Coverage, 2019–2024. ([n. d.]). 8 Appendix 8.1 Detailed Experimental Results We report here our detailed experimental results for the link predic- tion experiments using GraphSAGE, across the four news-sources...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.