A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.
What kinds of words/phrases might Demographic A use that De- mographic B would not? Specifically, we care about such words/phrases that are not obvious, or unexpected
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Splits! Flexible Sociocultural Linguistic Investigation at Scale
A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.