pith. sign in

arxiv: 1907.07818 · v1 · pith:NX7K75WNnew · submitted 2019-07-17 · 💻 cs.IR · cs.CL

Decoding the Style and Bias of Song Lyrics

Pith reviewed 2026-05-24 19:51 UTC · model grok-4.3

classification 💻 cs.IR cs.CL
keywords song lyricsword embeddingsWEAT testgender biasracial biasmusic analysiscomputational social sciencebias measurement
0
0 comments X

The pith

Song lyrics from over half a million tracks carry gender and racial biases that align with measurements from human subjects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies computational methods to characterize style and bias across a large collection of song lyrics spanning five decades. Style is quantified through vocabulary size, lyric length, repetitiveness, speed of delivery, and readability scores, revealing systematic differences between popular and other songs. Bias is measured by training word embeddings on the lyrics and running the Word Embedding Association Test for gender and racial associations. The resulting bias scores correlate with earlier WEAT results obtained from human participants, which the authors interpret as evidence that lyrics embed and reflect prevailing societal biases. This large-scale approach moves beyond prior manual analysis of small lyric sets.

Core claim

Distributed word representations trained on song lyrics, when tested with WEAT for gender and racial categories, produce bias measurements that correlate with prior results on human subjects, indicating that song lyrics reflect the biases that exist in society.

What carries the argument

Word embeddings derived from song lyrics combined with the Word Embedding Association Test (WEAT), which scores associations between target sets (e.g., male/female names) and attribute sets (e.g., career/family or pleasant/unpleasant terms).

If this is right

  • Popular songs exhibit measurably different stylistic properties than less popular songs across vocabulary, length, repetitiveness, speed, and readability.
  • Gender and racial biases can be tracked quantitatively in lyrics over multiple decades using embedding methods.
  • The correlation between lyric biases and human-subject biases supports the claim that lyrics mirror existing societal attitudes.
  • Large-scale embedding analysis extends bias measurement to cultural artifacts previously studied only through small manual samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If lyrics both reflect and potentially reinforce biases, targeted changes in lyric content could be tested for downstream effects on listener attitudes.
  • The same embedding-plus-WEAT pipeline could be applied to other large text collections such as movie scripts or social media to compare how different media embed biases.
  • Decade-by-decade tracking of bias scores might reveal whether lyric biases have intensified or attenuated relative to societal measures over time.

Load-bearing premise

The assumption that WEAT scores computed on lyric-derived embeddings are directly comparable to WEAT scores from human-subject studies or other text corpora, allowing a correlation to mean that lyrics reflect societal biases.

What would settle it

A dataset of independent societal bias measurements (from surveys or other corpora) that shows no statistical correlation with the WEAT scores obtained from the song-lyric embeddings.

Figures

Figures reproduced from arXiv: 1907.07818 by Amit Awekar, Manash Pratim Barman, Sambhav Kothari.

Figure 2
Figure 2. Figure 2: Year-wise rank comparison studies have reported adverse effects of inappropriate content in music on the listeners [11]. We also measured the length of song lyrics as the number of words in the song. Please refer to Figure 4a. Other songs have shown a steady increase in length from 1965 to 2015. Popular songs also showed similar trend till 1980. However, since then popular songs are significantly more leng… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of swear word usage [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Length, Duration, and Speed of songs [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of repetitiveness [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of FK score 4 BIAS MEASUREMENT We humans have certain biases in our thinking. For example, some people can find flower names more pleasant and insect names more unpleasant. These biases reflect in our various activities such as politics, movies, and song lyrics as well. Implicit Association Test (IAT) is a well-known test designed to measure such biases in human beings [9]. This test involves tw… view at source ↗
read the original abstract

The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes style and bias in >500k song lyrics over five decades. Style is quantified via vocabulary, length, repetitiveness, speed and readability, with popular songs shown to differ from others. Gender and racial biases are measured via word embeddings trained on the lyrics corpus and the WEAT test; observed correlations with prior human-subject WEAT/IAT results are interpreted as evidence that lyrics reflect societal biases.

Significance. A sound demonstration that WEAT scores on lyrics embeddings are commensurable with human-subject scales would supply large-scale, falsifiable evidence linking popular-culture text to societal bias patterns, strengthening the case for embedding-based cultural analytics in IR. The scale of the corpus is a clear asset; the current manuscript supplies none of the required validation steps.

major comments (2)
  1. [Abstract] Abstract: the headline claim that 'biases in song lyrics correlate with prior results on human subjects' is asserted without any reported correlation coefficient, p-value, word-set details, or embedding hyperparameters; the equivalence of WEAT d-values across lyrics versus general-text corpora is therefore untested and load-bearing for the societal-reflection interpretation.
  2. [Methods] Methods (WEAT application section): no diagnostic is presented that checks whether rhyme-driven co-occurrence, high repetition, or genre skew in lyrics preserve the same target-attribute association geometry that WEAT was validated on human data; absent such a check, an observed numerical correlation could be an embedding artifact rather than shared bias content.
minor comments (2)
  1. [Abstract] Abstract: 'more than half a million songs' should be replaced by the exact count and a citation to the data source.
  2. [Abstract] The abstract states that popular songs 'significantly differ' in style metrics but supplies no statistical test or effect-size table reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that 'biases in song lyrics correlate with prior results on human subjects' is asserted without any reported correlation coefficient, p-value, word-set details, or embedding hyperparameters; the equivalence of WEAT d-values across lyrics versus general-text corpora is therefore untested and load-bearing for the societal-reflection interpretation.

    Authors: We agree that the abstract should supply the quantitative support for the reported correlation. In the revision we will add the Pearson correlation coefficient and associated p-value between the lyrics-derived WEAT scores and the human-subject WEAT/IAT scores, together with the exact word sets employed and the embedding hyperparameters (dimensionality, window size, training algorithm). These additions will make the equivalence claim explicit rather than implicit. revision: yes

  2. Referee: [Methods] Methods (WEAT application section): no diagnostic is presented that checks whether rhyme-driven co-occurrence, high repetition, or genre skew in lyrics preserve the same target-attribute association geometry that WEAT was validated on human data; absent such a check, an observed numerical correlation could be an embedding artifact rather than shared bias content.

    Authors: The concern is valid: lyrics-specific properties could in principle distort the geometry that WEAT relies upon. We will therefore add a short diagnostic subsection that (i) trains control embeddings on shuffled lyric tokens and on genre-balanced subsamples and (ii) recomputes the same WEAT tests, reporting whether the target-attribute associations remain stable. If the associations prove sensitive, we will qualify the societal-reflection interpretation accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central claim rests on external WEAT comparisons to prior human-subject results

full rationale

The paper conducts an empirical study: it trains embeddings on a large lyrics corpus, applies the standard WEAT test for gender/racial biases, and reports numerical correlations with previously published WEAT results obtained on human subjects or other corpora. No equations, parameter-fitting steps, or derivations appear that would reduce any claimed result to its own inputs by construction. The load-bearing comparison is to external, independently published benchmarks rather than to any self-citation chain or internal fit. Consequently the analysis is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or detailed axioms are stated beyond the implicit assumption that WEAT applies to lyric text.

axioms (1)
  • domain assumption Word embeddings trained on song lyrics can be used with the WEAT test to measure gender and racial biases in a manner comparable to human-subject studies.
    The paper invokes this to interpret correlations as evidence that lyrics reflect societal biases.

pith-pipeline@v0.9.0 · 5675 in / 1220 out tokens · 30605 ms · 2026-05-24T19:51:22.911978+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Anderson, Nicholas L

    Craig A. Anderson, Nicholas L. Carnagey, and Janie Eubanks. 2003. Exposure to violent media: The effects of songs with violent lyrics on aggressive thoughts and feelings. Journal of Personality and Social Psychology (2003), 960–971

  2. [2]

    Manash Pratim Barman, Kavish Dahekar, Abhinav Anshuman, and Amit Awekar

  3. [3]

    To Appear at the European Conference on Information Retrieval (ECIR) 2019 (2019)

    It’s Only Words And Words Are All I Have. To Appear at the European Conference on Information Retrieval (ECIR) 2019 (2019)

  4. [4]

    Bertin-Mahieux, D

    T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)

  5. [5]

    Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En- riching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146

  6. [6]

    Bryson, and Arvind Narayanan

    Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356 (2017), 183–186

  7. [7]

    Joseph C.Nunes, Andrea Ordanini Jr, and Francesca Valsesia. 2015. The power of repetition: repetitive lyrics in a song increase processing fluency and drive market success. Journal of Consumer Psychology 25(2) (2015), 187–199

  8. [8]

    Laura Doering and Sarah ThÃľbaud. 2017. The Effects of Gendered Occupa- tional Roles on MenâĂŹs and WomenâĂŹs Workplace Authority: Evidence from Microfinance. American Sociological Review 82(3) (June 2017), 542–567

  9. [9]

    Michael Fell and Caroline Sporleder. 2014. Lyrics-based Analysis and Classifica- tion of Music. In COLING

  10. [10]

    Greenwald, Anthony G., McGhee, Debbie E., Schwartz, and Jordan L. K. 1998. Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74(6) (1998), 1464–1480

  11. [11]

    Tobias Greitemeyer. 2009. Effects of songs with prosocial lyrics on prosocial thoughts, affect, and behavior. Journal of Experimental Social Psychology 45 (January 2009), 186–190

  12. [12]

    Cougar Hall, Joshua H

    P. Cougar Hall, Joshua H. West, and Shane Hill. 2012. Sexualization in Lyrics of Popular Music from 1959 to 2009: Implications for Sexuality Educators. Sexuality & Culture 16(2) (2012), 103–117

  13. [13]

    J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. 1975. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. DTIC Document, Tech. Rep. (1975)

  14. [14]

    Peter Knees and Markus Schedl. 2013. A Survey of Music Similarity and Recom- mendation from Music Context Data. ACM Trans. Multimedia Comput. Commun. Appl. 10, 1, Article 2 (Dec. 2013), 21 pages. https://doi.org/10.1145/2542205. 2542206

  15. [15]

    Koskei, Margaret Barasa, and Beatrice Manyasi

    Judy C. Koskei, Margaret Barasa, and Beatrice Manyasi. 2018. STEREOTYPICAL PORTRAYAL OF WOMEN IN KIPSIGIS SECULAR SONGS. European Journal of Literature, Language and Linguistics Studies (2018). https://www.oapub.org/lit/ index.php/EJLLL/article/view/49

  16. [16]

    Mayer, R

    R. Mayer, R. Neumayer, and A. Rauber. 2008. Rhyme and Style Features for Musical Genre Classification by Song Lyrics. In Proceedings of International Conference on Music Information Retrieval (ISMIR) (June 2008), 337–342

  17. [17]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems . 3111–3119

  18. [18]

    Nielson. 2018. Nielsen U.S. Music Mid-Year Report. https://www.nielsen.com/us/ en/insights/reports/2018/us-music-mid-year-report-2018.html

  19. [19]

    Rasmussen and Rebecca L

    Eric E. Rasmussen and Rebecca L. Densley. 2016. Girl in a Country Song: Gender Roles and Objectification of Women in Popular Country Music across 1990 to Decoding the Style and Bias of Song Lyrics SIGIR ’19, July 21–25, 2019, Paris, France

  20. [20]

    Sex Roles 76 (2016), 188–201. [19] E. Sapir. 1985. Selected writings of Edward Sapir in language, culture and per- sonality. Univ of California Press 342 (1985)