pith. sign in

arxiv: 1401.6058 · v1 · pith:ZTDEKRPInew · submitted 2014-01-23 · 💻 cs.SI · physics.soc-ph

The Readability of Tweets and their Geographic Correlation with Education

classification 💻 cs.SI physics.soc-ph
keywords tweetsreadabilityeasecommunicationcorrelationdatadifferenceeducation
0
0 comments X
read the original abstract

Twitter has rapidly emerged as one of the largest worldwide venues for written communication. Thanks to the ease with which vast quantities of tweets can be mined, Twitter has also become a source for studying modern linguistic style. The readability of text has long provided a simple method to characterize the complexity of language and ease that documents may be understood by readers. In this note we use a modified version of the Flesch Reading Ease formula, applied to a corpus of 17.4 million tweets. We find tweets have characteristically more difficult readability scores compared to other short format communication, such as SMS or chat. This linguistic difference is insensitive to the presence of "hashtags" within tweets. By utilizing geographic data provided by 2% of users, joined with "ZIP Code Tabulation Area" (ZCTA) level education data from the U.S. Census, we find an intriguing correlation between the average readability and the college graduation rate within a ZCTA. This points towards a difference in either the underlying language, or a change in the type of content being tweeted in these areas

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.