Recognition: unknown
Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand
Pith reviewed 2026-05-10 08:31 UTC · model grok-4.3
The pith
New Zealand Reddit users link language to place and form contiguous speech communities with complex geographic alignment; Word2Vec embeddings reveal semantic variations and shifts in NZ English on a 4.26 billion word corpus.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Users generally associate language with place, and place-related communities form a contiguous speech community, though alignment between geographic dialect communities and place-related communities remains complex. Advanced language modelling, including static and diachronic Word2Vec language embeddings, revealed semantic variation across place-based communities and meaningful semantic shifts within New Zealand English.
Load-bearing premise
That Reddit communities tied to places accurately represent geographic dialect communities and that user perceptions of language-place links correspond to measurable patterns in actual language use.
Figures
read the original abstract
This thesis investigates geographic dialect alignment in place-informed social media communities, focussing on New Zealand-related Reddit communities. By integrating qualitative analyses of user perceptions with computational methods, the study examines how language use reflects place identity and patterns of language variation and change based on user-informed lexical, morphosyntactic, and semantic variables. The findings show that users generally associate language with place, and place-related communities form a contiguous speech community, though alignment between geographic dialect communities and place-related communities remains complex. Advanced language modelling, including static and diachronic Word2Vec language embeddings, revealed semantic variation across place-based communities and meaningful semantic shifts within New Zealand English. The research involved the creation of a corpus containing 4.26 billion unprocessed words, which offers a valuable resource for future study. Overall, the results highlight the potential of social media as a natural laboratory for sociolinguistic inquiry.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No circularity: empirical corpus analysis and embeddings rest on external data, not self-definition.
full rationale
The paper constructs a 4.26B-word Reddit corpus, performs qualitative perception analysis, and applies standard static/diachronic Word2Vec embeddings to measure semantic variation. No equations, fitted parameters renamed as predictions, or self-citations that bear the central load appear in the provided abstract or described methods. Claims about place-language associations derive from observed patterns in the collected data rather than reducing to inputs by construction. The derivation chain is self-contained against the external corpus and does not invoke uniqueness theorems or ansatzes from prior self-work that would force the result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language use in place-informed social media communities reflects geographic dialect alignment and place identity.
Reference graph
Works this paper leans on
-
[1]
and Gordon, E
Abell, M. and Gordon, E. (1990). This objectionable colonial dialect': historical and contemporary attitudes to New Zealand speech. In Bell, A. and Holmes, J., editors, New Zealand Ways of Speaking English . Multilingual Matters, Clevedon, England; Bristol, PA
1990
-
[2]
They Had Us In the First Half
Adam (2019). They Had Us In the First Half
2019
-
[3]
Adams, N. N. (2022). ' Scraping ' Reddit posts for academic research? Addressing some blurred lines of consent in growing internet-based research trend during the time of COVID -19. International journal of social research methodology , 27(1). https://doi.org/10.1080/13645579.2022.2111816
-
[4]
Agha, A. (2003). The social life of cultural value. Language & Communication , 23(3):231--273. https://doi.org/10.1016/S0271-5309(03)00012-0
-
[5]
Agnew, J. A. (1987). Place and Politics : The Geographical Mediation of State and Society , volume 1 of Routeledge Library Editions : Political Geography . Routledge, Abingdon, England; New York, NY, 3 edition
1987
-
[6]
Ainsworth, H. (2004). Regional Variation in New Zealand English : the Taranaki Sing - Song Accent . Retrieved from https://doi.org/10.26686/wgtn.16945720.v1
-
[7]
Amaya, A., Bach, R., Keusch, F., and Kreuter, F. (2021). New Data Sources in Social Science Research : Things to Know Before Working With Reddit Data . Social Science Computer Review , 39(5):943--960. https://doi.org/10.1177/0894439319893305
-
[8]
and Ziegler, E
Androutsopoulos, J. and Ziegler, E. (2004). Exploring language variation on the Internet : Regional speech in a chat community. In Language variation in Europe : papers from the second international conference on language variation in Europe , ICLaVE , volume 2, pages 99--111
2004
-
[9]
Antonakaki, D., Fragopoulou, P., and Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications , 164:114006. https://doi.org/10.1016/j.eswa.2020.114006
-
[10]
Nek Minute
AxstaBludsta (2011). Nek Minute . Retrieved from https://youtu.be/CTZyorJVeqI
2011
-
[11]
Ballantyne, T. (2011). On Place , Space and Mobility in Nineteenth - Century New Zealand . New Zealand Journal of History , 45(1):50--70. Retrieved from https://muse.jhu.edu/article/879358/
2011
-
[12]
Ballard, E., Charters, H., Meyerhoff, M., and Watson, C. (2025). New Zealand , Multicultural Auckland English . In The Wiley Blackwell Encyclopedia of World Englishes , pages 1--10. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/9781119518297.eowe00118
-
[13]
Bamman, D., Dyer, C., and Smith, N. A. (2014). Distributed Representations of Geographically Situated Language . In Toutanova, K. and Wu, H., editors, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics ( Volume 2: Short Papers ) , pages 828--834, Baltimore, Maryland. Association for Computational Linguistics
2014
-
[14]
L., Beal, J
Barber, C. L., Beal, J. C., and Shaw, P. A. (1993). The English Language : A Historical Introduction . Cambridge University Press, New York, NY, 2 edition
1993
-
[15]
Bardsley, D. (2006). A Specialist Study in New Zealand English Lexis : The Rural Sector . International Journal of Lexicography , 19(1):41--72. https://doi.org/10.1093/ijl/eci052
-
[16]
Bardsley, D. (2009). Lexicography in New Zealand . Technical report, New Zealand Dictionary Centre, Wellington, New Zealand
2009
-
[17]
Bardsley, D. and Simpson, J. (2009). Hypocoristics in New Zealand and Australian English . In Peters, P., Collins, P., and Smith, A., editors, Comparative Studies in Australian and New Zealand English : Grammar and beyond , pages 49--70. John Benjamins Publishing Company, Amsterdam, The Netherlands; Philadelphia, PA. https://doi.org/10.1075/veaw.g39.04bar
-
[18]
Bartlett, C. M. (1992). Regional Variation in New Zealand English : The Case of Southland . New Zealand English Newsletter , 6:5--15
1992
-
[19]
Bauer, L. (1987). New Zealand English morphology: Some experimental evidence. Te Reo – The Journal of the Linguistic Society of New Zealand , 30(1):37--53. Retrieved from https://nzlingsoc.org/journal\_article/new-zealand-english-morphology-some-experimental-evidence/
1987
-
[20]
Bauer, L. (1994a). Introducing the Wellington Corpus of Written New Zealand English . Te Reo – The Journal of the Linguistic Society of New Zealand , 37:21--28. Retrieved from https://nzlingsoc.org/journal\_article/introducing-the-wellington-corpus-of-written-new-zealand-english/
-
[21]
Bauer, L. (1994b). Watching English Change : An Introduction to the Study of Linguistic Change in Standard Englishes in the 20th Century . Longman, Harlow, England. https://doi.org/10.4324 /9781315844169
-
[22]
Bauer, L. (2007). Some Grammatical Features of New Zealand English . New Zealand English Journal , 21:1--25. Retrieved from https://search.informit.org/doi/10.3316/informit.555852519597653
-
[23]
Bauer, L. and Bauer, W. (2002). Can we watch regional dialects developing in colonial English ?: The case of New Zealand . English World-Wide , 23(2):169--193. https://doi.org/10.1075/eww.23.2.02bau
-
[24]
and Bauer, W
Bauer, L. and Bauer, W. (2003). Playground Talk : Dialects and Change in New Zealand English . School of Linguistics and Applied Language Studies, Victoria University of Wellington, Wellington, New Zealand
2003
- [25]
-
[26]
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., and Blackburn, J. (2020). The Pushshift Reddit Dataset . In Proceedings of the International AAAI Conference on Web and Social Media , volume 14, pages 830--839, Atlanta, GA. PKP Publishing Services Network. https://doi.org/10.1609/icwsm.v14i1.7347
-
[27]
Bayard, D. (1989). ‘ Me Say That ? No Way !': The social correlates of American lexical diffusion in New Zealand English . Te Reo , 32(1):17--60. Retrieved from https://nzlingsoc.org/journal\_article/me-say-that-no-way-the-social-correlates-of-american-lexical-diffusion-in-new-zealand-english/
1989
-
[28]
Bayard, D. (1991). Antipodean Accents and the `` Cultural Cringe '': New Zealand and American Attitudes Toward NZE and Other English Accents . Te Reo , 34(1):15--52. Retrieved from https://nzlingsoc.org/journal\_article/antipodean-accents-and-the-cultural-cringe-new-zealand-and-american-attitudes-toward-nze-and-other-english-accents/
1991
-
[29]
Beaman, K. V. (2021). Identity and mobility in linguistic change across the lifespan: The case of Swabian German . In Ziegler, A., Edler, S., and Oberdorfer, G., editors, Urban Matters : Current approaches in variationist sociolinguistics , pages 27--60. John Benjamins Publishing Company, Amsterdam, The Netherlands; Philadelphia, PA. https://doi.org/10.10...
-
[30]
Bell, C. (1996). Inventing New Zealand : Everyday Myths of Pakeha Identity . Penguin Books, Harmondsworth, England
1996
-
[31]
Berger, P. L. and Luckmann, T. (1966). The Social Construction of Reality : A Treatise in the Sociology of Knowledge . Anchor, London, England; New York, NY; Camberwell, Victoria, Australia; Toronto, Ontario, Canada; New Delhi, India; Auckland, New Zealand; Rosebank, South Africa
1966
-
[32]
Biber, D. and Conrad, S. (2009). Register, Genre , and Style . Cambridge Textbooks in Linguistics . Cambridge University Press, Cambridge, England. https://doi.org/10.1017/CBO9780511814358
-
[33]
Birznieks, L. (2020). The Perpetuation of Western Dominance through Online Discourse : A Critical Discourse Analysis of Reddit Comment Threads . Retrieved from https://studenttheses.uu.nl/handle/20.500.12932/36478
2020
-
[34]
Blaschke, V., Purschke, C., Schuetze, H., and Plank, B. (2024). What Do Dialect Speakers Want ? A Survey of Attitudes Towards Language Technology for German Dialects . In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics ( Volume 2: Short Papers ) , pages 823--841, Ba...
-
[35]
Blommaert, J. (2013). Writing as a sociolinguistic object. Journal of Sociolinguistics , 17(4):440--459. https://doi.org/10.1111/josl.12042
-
[36]
Fast Unfolding of Communities in Large Networks
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment , 2008(10):P10008. https://dx.doi.org/10.1088/1742-5468/2008/10/P10008
-
[37]
Bloomfield, L. (1933). Language . George Allen & Unwin, London, England
1933
-
[38]
Braun, V. and Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology , 3(2):77--101. https://doi.org/10.1191/1478088706qp063oa
- [39]
-
[40]
Bucher, T. (2013). Objects of Intense Feeling : The Case of the Twitter API . Computational Culture , 3. Retrieved from http://computationalculture.net/objects-of-intense-feeling-the-case-of-the-twitter-api/
2013
-
[41]
Burrows, J. (2002). ‘ Delta ’: a Measure of Stylistic Difference and a Guide to Likely Authorship . Literary & Linguistic Computing , 17(3):267--287. https://doi.org/10.1093/llc/17.3.267
-
[42]
Cabitza, F., Campagner, A., and Basile, V. (2023). Toward a Perspectivist Turn in Ground Truthing for Predictive Computing . Proceedings of the AAAI Conference on Artificial Intelligence , 37(6):6860--6868
2023
-
[43]
Calude, A. S. (2023). The Linguistics of Social Media : An Introduction . Taylor & Francis, Abingdon, England; New York, NY. https://doi.org/10.4324/9781003321873
-
[44]
Calude, A. S., Long, M., and Burnette, J. (2024). \# AreHashtagsWords ? Structure , position, and syntactic integration of hashtags in ( English ) tweets. Linguistics Vanguard , 10(1):105--114. https://doi.org/10.1515/lingvan-2023-0044
-
[45]
Cannon, G. (1985). Functional shift in English . Linguistics , 23(3):411--432. https://doi.org/10.1515/ling.1985.23.3.411
-
[46]
Cannon, G. (1986). Blends in English word formation. Linguistics , 24(4):725--754. https://doi.org/10.1515/ling.1986.24.4.725
-
[47]
Cannon, G. (1989). Abbreviations and Acronyms in English Word - Formation . American Speech , 64(2):99--127. https://doi.org/10.2307/455038
-
[48]
and Bailey, G
Cannon, G. and Bailey, G. (1986). Back- Formations in English Word - Formation . Meta , 31(4):427--438
1986
-
[49]
Carmichael, K. (2023). Locating place in variationist sociolinguistics: Making the case for ethnographically informed multidimensional place orientation metrics. Journal of Linguistic Geography , 11(2):65--77. https://doi.org/10.1017/jlg.2023.2
-
[50]
Carmichael, K. and Reed, P. E. (2025). Language and Place . Cambridge University Press, Cambridge, England; New York, NY; Port Melbourne, Victoria, Australia; New Delhi, India; Singapore, Singapore. https://doi.org/10.1017/9781009380874
-
[51]
Chambers, J. K. (2000). Region and language variation. English World-Wide , 21(2):169--199. https://doi.org/10.1075/eww.21.2.02cha
-
[52]
Chambers, J. K. and Trudgill, P. (1998). Dialectology . Cambridge Textbooks in Linguistics . Cambridge University Press, Cambridge, England, 2 edition. https://doi.org/10.1017/CBO9780511805103
-
[53]
Charmaz, K. (2006). Constructing Grounded Theory : A Practical Guide through Qualitative Analysis . Introducing Qualitative Methods . SAGE Publications, London, England; Thousand Oaks, CA; New Delhi, India
2006
-
[54]
Chomsky, N. (1965). Aspects of the Theory of Syntax . The MIT Press, Cambridge, MA
1965
-
[55]
Church, K. W. and Hanks, P. (1990). Word Association Norms , Mutual Information , and Lexicography . Computational Linguistics , 16(1):22--29. Retrieved from https://aclanthology.org/J90-1003/
1990
-
[56]
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020). Unsupervised Cross -lingual Representation Learning at Scale . In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguist...
-
[57]
Coupland, N. (2001). Introduction: Sociolinguistic Theory and Social Theory . In Coupland, N., Sarangi, S., and Candlin, C. N., editors, Sociolinguistics and Social Theory , Language in Social Life Series , pages 1--26. Longman, Harlow, England
2001
-
[58]
and Nepusz, T
Csárdi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems , 1695:1--9. Retrieved from https://igraph.org/
2006
- [59]
-
[60]
Cutler, C. (2020). Metapragmatic comments and orthographic performances of a New York accent on YouTube . World Englishes , 39(1):36--53. https://doi.org/10.1111/weng.12444
-
[62]
Cutler, C., Ahmar, M., and Bahri, S. (2022b). Introduction: The Oralization of Digital Written Communication . In Cutler, C., Ahmar, M., and Bahri, S., editors, Digital Orality : Vernacular Writing in Online Spaces , pages 3--31. Springer International Publishing, Cham, Switzerland. https://doi.org/10.1007/978-3-031-10433-6\_1
-
[63]
Danescu-Niculescu-Mizil, C., Gamon, M., and Dumais, S. (2011). Mark my words! linguistic style accommodation in social media. In Proceedings of the 20th international conference on World wide web , WWW '11, pages 745--754, New York, NY, USA. Association for Computing Machinery. https://doi.org/10.1145/1963405.1963509
-
[64]
De Bres, J. (2010). Attitudes of non- Maori New Zealanders towards the use of Maori in New Zealand English . New Zealand English Journal , 24:2--14. Retrieved from https://search.informit.org/doi/10.3316/informit.206778324257255
-
[65]
and Nicholas, S
de Bres, J. and Nicholas, S. A. (2021). The sexiest accent in the world: Linguistic insecurity and prejudice in media coverage of the New Zealand accent. Te Reo – The Journal of the Linguistic Society of New Zealand , 64(1):15--32. Retrieved from https://nzlingsoc.org/journal\_article/the-sexiest-accent-in-the-world-linguistic-insecurity-and-prejudice-in-...
2021
-
[66]
Degani, M. (2012). Language contact in New Zealand : A focus on English lexical borrowings in Māori . Academic Journal of Modern Philology , 1:13--24
2012
-
[67]
Desmarais, A.-M. (2020). Men who knit: A social media critical discourse study ( SM - CDS ) on the legitimisation of men within Reddit ’sr/knitting community. Retrieved from https://hdl.handle.net/10292/13594
2020
-
[68]
Deverson, T. (2000). Handling New Zealand English lexis. In Bell, A. and Kuiper, K., editors, New Zealand English , pages 23--39. Victoria University Press, Amsterdam, The Netherlands; Philadelphia, PA
2000
-
[74]
Deverson, T. and Kennedy, G. (2005f). ute. In The New Zealand Oxford Dictionary . Oxford University Press. https://doi.org/10.1093/acref/9780195584516.001.0001
-
[75]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT : Pre -training of Deep Bidirectional Transformers for Language Understanding . In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies , volume 1 of...
-
[76]
and Williams, E
Di Sciullo, A.-M. and Williams, E. (1987). On the definition of word . Number 14 in Linguistic inquiry monographs. The MIT Press, Cambridge, MA
1987
-
[77]
Dijkstra, J., Heeringa, W., Jongbloed-Faber, L., and Van de Velde, H. (2021). Using Twitter Data for the Study of Language Change in Low - Resource Languages . A Panel Study of Relative Pronouns in Frisian . Frontiers in Artificial Intelligence , 4. https://doi.org/10.3389/frai.2021.644554
-
[78]
Donald, S. (2018). It's a colonial thing: New Zealand cultural identity and the use of 'colony' as a social category in intercultural communication. New Zealand Studies in Applied Linguistics , 24(1):5--17. Retrieved from https://search.informit.org/doi/abs/10.3316/INFORMIT.740485668341927
-
[79]
Dourish, P. (2004). What we talk about when we talk about context. Personal and Ubiquitous Computing , 8(1):19--30. https://doi.org/10.1007/s00779-003-0253-8
-
[80]
Dourish, P. (2006). Re-space-ing place: "place" and "space" ten years on. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work , pages 299--308, New York, NY. Association for Computing Machinery. https://doi.org/10.1145/1180875.1180921
-
[81]
Duhamel, M.-F. and Meyerhoff, M. (2015). An end of egalitarianism? Social evaluations of language difference in New Zealand . Linguistics Vanguard , 1(1):235--248. https://doi.org/10.1515/lingvan-2014-1005
-
[82]
Dunn, J. (2017). Computational learning of construction grammars. Language and Cognition , 9(2):254--292. https://doi.org/10.1017/langcog.2016.7
-
[83]
Dunn, J. (2019a). Global Syntactic Variation in Seven Languages : Toward a Computational Dialectology . Frontiers in Artificial Intelligence , 2:15. https://doi.org/10.3389/frai.2019.00015
-
[84]
Dunn, J. (2019b). Modeling Global Syntactic Variation in English Using Dialect Classification . In Zampieri, M., Nakov, P., Malmasi, S., Ljubešić, N., Tiedemann, J., and Ali, A., editors, Proceedings of the Sixth Workshop on NLP for Similar Languages , Varieties and Dialects , pages 42--53, Ann Arbor, Michigan. Association for Computational Linguistics. h...
-
[85]
Dunn, J. (2020). Mapping languages: the Corpus of Global Language Use . Language Resources and Evaluation , 54(4):999--1018. https://doi.org/10.1007/s10579-020-09489-2
-
[86]
Dunn, J. (2022). Natural Language Processing for Corpus Linguistics . Cambridge University Press, Cambridge, England; New York, NY; Port Melbourne, Victoria, Australia; New Delhi, India; Singapore, Singapore. https://doi.org/10.1017/9781009070447
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.