Decoding the Style and Bias of Song Lyrics
Pith reviewed 2026-05-24 19:51 UTC · model grok-4.3
The pith
Song lyrics from over half a million tracks carry gender and racial biases that align with measurements from human subjects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Distributed word representations trained on song lyrics, when tested with WEAT for gender and racial categories, produce bias measurements that correlate with prior results on human subjects, indicating that song lyrics reflect the biases that exist in society.
What carries the argument
Word embeddings derived from song lyrics combined with the Word Embedding Association Test (WEAT), which scores associations between target sets (e.g., male/female names) and attribute sets (e.g., career/family or pleasant/unpleasant terms).
If this is right
- Popular songs exhibit measurably different stylistic properties than less popular songs across vocabulary, length, repetitiveness, speed, and readability.
- Gender and racial biases can be tracked quantitatively in lyrics over multiple decades using embedding methods.
- The correlation between lyric biases and human-subject biases supports the claim that lyrics mirror existing societal attitudes.
- Large-scale embedding analysis extends bias measurement to cultural artifacts previously studied only through small manual samples.
Where Pith is reading between the lines
- If lyrics both reflect and potentially reinforce biases, targeted changes in lyric content could be tested for downstream effects on listener attitudes.
- The same embedding-plus-WEAT pipeline could be applied to other large text collections such as movie scripts or social media to compare how different media embed biases.
- Decade-by-decade tracking of bias scores might reveal whether lyric biases have intensified or attenuated relative to societal measures over time.
Load-bearing premise
The assumption that WEAT scores computed on lyric-derived embeddings are directly comparable to WEAT scores from human-subject studies or other text corpora, allowing a correlation to mean that lyrics reflect societal biases.
What would settle it
A dataset of independent societal bias measurements (from surveys or other corpora) that shows no statistical correlation with the WEAT scores obtained from the song-lyric embeddings.
Figures
read the original abstract
The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes style and bias in >500k song lyrics over five decades. Style is quantified via vocabulary, length, repetitiveness, speed and readability, with popular songs shown to differ from others. Gender and racial biases are measured via word embeddings trained on the lyrics corpus and the WEAT test; observed correlations with prior human-subject WEAT/IAT results are interpreted as evidence that lyrics reflect societal biases.
Significance. A sound demonstration that WEAT scores on lyrics embeddings are commensurable with human-subject scales would supply large-scale, falsifiable evidence linking popular-culture text to societal bias patterns, strengthening the case for embedding-based cultural analytics in IR. The scale of the corpus is a clear asset; the current manuscript supplies none of the required validation steps.
major comments (2)
- [Abstract] Abstract: the headline claim that 'biases in song lyrics correlate with prior results on human subjects' is asserted without any reported correlation coefficient, p-value, word-set details, or embedding hyperparameters; the equivalence of WEAT d-values across lyrics versus general-text corpora is therefore untested and load-bearing for the societal-reflection interpretation.
- [Methods] Methods (WEAT application section): no diagnostic is presented that checks whether rhyme-driven co-occurrence, high repetition, or genre skew in lyrics preserve the same target-attribute association geometry that WEAT was validated on human data; absent such a check, an observed numerical correlation could be an embedding artifact rather than shared bias content.
minor comments (2)
- [Abstract] Abstract: 'more than half a million songs' should be replaced by the exact count and a citation to the data source.
- [Abstract] The abstract states that popular songs 'significantly differ' in style metrics but supplies no statistical test or effect-size table reference.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address the two major comments point by point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim that 'biases in song lyrics correlate with prior results on human subjects' is asserted without any reported correlation coefficient, p-value, word-set details, or embedding hyperparameters; the equivalence of WEAT d-values across lyrics versus general-text corpora is therefore untested and load-bearing for the societal-reflection interpretation.
Authors: We agree that the abstract should supply the quantitative support for the reported correlation. In the revision we will add the Pearson correlation coefficient and associated p-value between the lyrics-derived WEAT scores and the human-subject WEAT/IAT scores, together with the exact word sets employed and the embedding hyperparameters (dimensionality, window size, training algorithm). These additions will make the equivalence claim explicit rather than implicit. revision: yes
-
Referee: [Methods] Methods (WEAT application section): no diagnostic is presented that checks whether rhyme-driven co-occurrence, high repetition, or genre skew in lyrics preserve the same target-attribute association geometry that WEAT was validated on human data; absent such a check, an observed numerical correlation could be an embedding artifact rather than shared bias content.
Authors: The concern is valid: lyrics-specific properties could in principle distort the geometry that WEAT relies upon. We will therefore add a short diagnostic subsection that (i) trains control embeddings on shuffled lyric tokens and on genre-balanced subsamples and (ii) recomputes the same WEAT tests, reporting whether the target-attribute associations remain stable. If the associations prove sensitive, we will qualify the societal-reflection interpretation accordingly. revision: yes
Circularity Check
No significant circularity; central claim rests on external WEAT comparisons to prior human-subject results
full rationale
The paper conducts an empirical study: it trains embeddings on a large lyrics corpus, applies the standard WEAT test for gender/racial biases, and reports numerical correlations with previously published WEAT results obtained on human subjects or other corpora. No equations, parameter-fitting steps, or derivations appear that would reduce any claimed result to its own inputs by construction. The load-bearing comparison is to external, independently published benchmarks rather than to any self-citation chain or internal fit. Consequently the analysis is self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Word embeddings trained on song lyrics can be used with the WEAT test to measure gender and racial biases in a manner comparable to human-subject studies.
Reference graph
Works this paper leans on
-
[1]
Craig A. Anderson, Nicholas L. Carnagey, and Janie Eubanks. 2003. Exposure to violent media: The effects of songs with violent lyrics on aggressive thoughts and feelings. Journal of Personality and Social Psychology (2003), 960–971
work page 2003
-
[2]
Manash Pratim Barman, Kavish Dahekar, Abhinav Anshuman, and Amit Awekar
-
[3]
To Appear at the European Conference on Information Retrieval (ECIR) 2019 (2019)
It’s Only Words And Words Are All I Have. To Appear at the European Conference on Information Retrieval (ECIR) 2019 (2019)
work page 2019
-
[4]
T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)
work page 2011
-
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En- riching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146
work page 2017
-
[6]
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356 (2017), 183–186
work page 2017
-
[7]
Joseph C.Nunes, Andrea Ordanini Jr, and Francesca Valsesia. 2015. The power of repetition: repetitive lyrics in a song increase processing fluency and drive market success. Journal of Consumer Psychology 25(2) (2015), 187–199
work page 2015
-
[8]
Laura Doering and Sarah ThÃľbaud. 2017. The Effects of Gendered Occupa- tional Roles on MenâĂŹs and WomenâĂŹs Workplace Authority: Evidence from Microfinance. American Sociological Review 82(3) (June 2017), 542–567
work page 2017
-
[9]
Michael Fell and Caroline Sporleder. 2014. Lyrics-based Analysis and Classifica- tion of Music. In COLING
work page 2014
-
[10]
Greenwald, Anthony G., McGhee, Debbie E., Schwartz, and Jordan L. K. 1998. Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74(6) (1998), 1464–1480
work page 1998
-
[11]
Tobias Greitemeyer. 2009. Effects of songs with prosocial lyrics on prosocial thoughts, affect, and behavior. Journal of Experimental Social Psychology 45 (January 2009), 186–190
work page 2009
-
[12]
P. Cougar Hall, Joshua H. West, and Shane Hill. 2012. Sexualization in Lyrics of Popular Music from 1959 to 2009: Implications for Sexuality Educators. Sexuality & Culture 16(2) (2012), 103–117
work page 2012
-
[13]
J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. 1975. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. DTIC Document, Tech. Rep. (1975)
work page 1975
-
[14]
Peter Knees and Markus Schedl. 2013. A Survey of Music Similarity and Recom- mendation from Music Context Data. ACM Trans. Multimedia Comput. Commun. Appl. 10, 1, Article 2 (Dec. 2013), 21 pages. https://doi.org/10.1145/2542205. 2542206
-
[15]
Koskei, Margaret Barasa, and Beatrice Manyasi
Judy C. Koskei, Margaret Barasa, and Beatrice Manyasi. 2018. STEREOTYPICAL PORTRAYAL OF WOMEN IN KIPSIGIS SECULAR SONGS. European Journal of Literature, Language and Linguistics Studies (2018). https://www.oapub.org/lit/ index.php/EJLLL/article/view/49
work page 2018
- [16]
-
[17]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems . 3111–3119
work page 2013
-
[18]
Nielson. 2018. Nielsen U.S. Music Mid-Year Report. https://www.nielsen.com/us/ en/insights/reports/2018/us-music-mid-year-report-2018.html
work page 2018
-
[19]
Eric E. Rasmussen and Rebecca L. Densley. 2016. Girl in a Country Song: Gender Roles and Objectification of Women in Popular Country Music across 1990 to Decoding the Style and Bias of Song Lyrics SIGIR ’19, July 21–25, 2019, Paris, France
work page 2016
-
[20]
Sex Roles 76 (2016), 188–201. [19] E. Sapir. 1985. Selected writings of Edward Sapir in language, culture and per- sonality. Univ of California Press 342 (1985)
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.