pith. machine review for the scientific record. sign in

arxiv: 2604.15744 · v1 · submitted 2026-04-17 · 💻 cs.CL

Recognition: unknown

Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:31 UTC · model grok-4.3

classification 💻 cs.CL
keywords communitieslanguagealignmentdialectgeographicmediaplacesemantic
0
0 comments X

The pith

New Zealand Reddit users link language to place and form contiguous speech communities with complex geographic alignment; Word2Vec embeddings reveal semantic variations and shifts in NZ English on a 4.26 billion word corpus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work mixes qualitative checks on how people think language connects to location with computer models that turn words into numbers to track meaning. It looks at word choice, grammar patterns, and overall sense in online groups tied to specific New Zealand places. Static embeddings capture current differences between these groups while diachronic versions track how word meanings have shifted over time in New Zealand English. A very large set of raw text from these communities was assembled to support the analysis and future projects. The results indicate that place-based online groups behave like neighboring speech communities, yet the exact match to real geographic dialects is not straightforward. This approach treats social media as a living record of how language varies and evolves with identity and location.

Core claim

Users generally associate language with place, and place-related communities form a contiguous speech community, though alignment between geographic dialect communities and place-related communities remains complex. Advanced language modelling, including static and diachronic Word2Vec language embeddings, revealed semantic variation across place-based communities and meaningful semantic shifts within New Zealand English.

Load-bearing premise

That Reddit communities tied to places accurately represent geographic dialect communities and that user perceptions of language-place links correspond to measurable patterns in actual language use.

Figures

Figures reproduced from arXiv: 2604.15744 by Sidney Wong.

Figure 2.1
Figure 2.1. Figure 2.1: Tripartite Model of Place Description: Tripartite model of place from Agnew (1987) as adapted from Reed (2020a). The three fundamental components are linked to setting, connection, and identity. Sense of Place Canadian geographer Edward Relph explained that place “has a range of subtleties and significances as great as the range of human experiences and intention” (Relph, 1976, p.26). This suggests there… view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: Sociotheoretical perspectives adapted from [PITH_FULL_IMAGE:figures/full_fig_p075_2_2.png] view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Screenshot of Selfpost 155d1f3 Evaluation As a tool of inquiry, discourse analysis provides the data to support a given qualitative hypothesis (Gee, 2005). Based on SQ1, my aim is to determine whether users in place￾based communities associate language use with place identity. Firstly, I predict users do associate their place identity with their language use. I also predict users are intuitively attuned … view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Screenshot of Selfpost 2 (1102c68) 81 [PITH_FULL_IMAGE:figures/full_fig_p113_4_2.png] view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Selfpost 2 (1102c68): Cognitive I-Statements used the selfpost as a platform to initiate a discussion with the users of r/newzealand, as indicated by the Discussion post flair. Not only did the OP want the users to engage with the selfpost through upvoting, but the OP also encouraged comments through their open￾ended question in the coda of sub-story 1: What else have you seen US or otherwise?. By includ… view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: Clustering User-informed Variables across Country-level Communities [PITH_FULL_IMAGE:figures/full_fig_p159_5_1.png] view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Clustering User-informed Variables for r/newzealand Description: This scatterplot visualises user-informed lexical variables for conservative variants within r/newzealand, utilizing k-means clustering (k = 2) and PCA for dimensionality reduction on token fre￾quency (n) and proportions. The analysis reveals two distinct groupings: Cluster 0 (blue), consisting of low token frequency and non-dominant conser… view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: Emergence and Growth of nek minnit Description: This lineplot visualises the emergence and growth of the lexical item nek minnit and its variants over time within r/newzealand. The feature first emerged in 2011, followed by a decline in usage in 2013, after which it has shown a consistent increase in frequency through to 2024; all data is sourced from Reddit/Pushshift (Baumgartner et al., 2020). 130 [PI… view at source ↗
Figure 5.4
Figure 5.4. Figure 5.4: User Engagement by Hour between Communities [PITH_FULL_IMAGE:figures/full_fig_p170_5_4.png] view at source ↗
Figure 5.5
Figure 5.5. Figure 5.5: Proportional Frequency by Hour (tramp and tramping) Description: This figure illustrates the proportion of user-informed variables tramp (a) and tramping (b) over a 24-hour period within r/newzealand comments (rcomm). Observations indicate a minor increase in the proportional frequency of the innovative variants hike and hiking between 12:00 and 18:00 UTC, a window corresponding approximately to midnight… view at source ↗
Figure 5.6
Figure 5.6. Figure 5.6: Proportional Frequency by Hour (Other Variables) [PITH_FULL_IMAGE:figures/full_fig_p172_5_6.png] view at source ↗
Figure 6.1
Figure 6.1. Figure 6.1: Lineplots of Classifier Performance by Sample Size [PITH_FULL_IMAGE:figures/full_fig_p186_6_1.png] view at source ↗
Figure 6.2
Figure 6.2. Figure 6.2: Stability of Dialect Classification Over Time [PITH_FULL_IMAGE:figures/full_fig_p187_6_2.png] view at source ↗
Figure 6.3
Figure 6.3. Figure 6.3: Most Similar to tramp in NewZealand Description: This visualisation presents the top 30 words most similar to tramp by cosine similarity within the Word2Vec embedding model trained on New Zealand city-level communities (NewZealand) using the skip-gram with negative sampling architecture. The terms most similar to tramp are hiking, kayaking, skiing, and surfing; all data is sourced from Reddit/Pushshift (… view at source ↗
Figure 6.4
Figure 6.4. Figure 6.4: Most Similar to tramp in Google300 Description: This visualisation presents the top 30 words most similar to tramp by cosine similar￾ity within the Word2Vec embedding model trained on the Google News dataset (Google300; Mikolov et al., 2013). The highest-ranking terms include dipsomaniac, boozehound, swineherd, swagman, and doss_house; data is sourced from the Google300 corpus (Mikolov et al., 2013). 174… view at source ↗
Figure 6.5
Figure 6.5. Figure 6.5: Clustering Analysis of tramp across Embedding Models Description: This figure displays subfigures comparing the NewZealand and Google300 models for the term tramp, utilising dimensionality reduction via PCA and k-means clustering of semantically similar keywords. The analysis includes a conservative variant (mountain, trek, hike, trail), an innovative variant (poverty, homeless, street), and neutral term… view at source ↗
Figure 7.1
Figure 7.1. Figure 7.1: Network Graph of Related Communities Description: This network graph visualises communities related to r/newzealand based on user engage￾ment, originally developed by u/avanka to measure Jaccard similarity among users posting across multiple communities between August and September 2018. The visualisation demonstrates that users active on r/newzealand also engage with other place-based and special intere… view at source ↗
Figure 7.2
Figure 7.2. Figure 7.2: User Engagement by Hour in New Zealand-related Communities [PITH_FULL_IMAGE:figures/full_fig_p219_7_2.png] view at source ↗
Figure 7.3
Figure 7.3. Figure 7.3: User Engagement in New Zealand-related Communities ( [PITH_FULL_IMAGE:figures/full_fig_p220_7_3.png] view at source ↗
Figure 7.4
Figure 7.4. Figure 7.4: User Engagement in New Zealand-related Communities ( [PITH_FULL_IMAGE:figures/full_fig_p221_7_4.png] view at source ↗
Figure 7.5
Figure 7.5. Figure 7.5: Example of a Syntactic Construction Description: This barplot (a) and lineplot (b) illustrate the frequency and temporal distribution of the syntactic construction syn:ADP – syn:the – syn:<200>, which captures New Zealand-related political discourse with a specific focus on Te Tiriti o Waitangi (the Treaty of Waitangi). Primarily sourced from r/newzealand and r/ConservativeKiwi, the construction exhibits… view at source ↗
Figure 7.6
Figure 7.6. Figure 7.6: Example of All (Combined) Construction Description: This barplot (a) and lineplot (b) illustrate the combined frequency and temporal trajectory of the construction sem:the > sem:<1089> related to the Covid-19 pandemic, where C2xG captures both syntactic and semantic constraints. Primarily sourced from r/newzealand and r/ConservativeKiwi, the construction emerged in 2019 and reached peak usage in 2022; th… view at source ↗
Figure 7.7
Figure 7.7. Figure 7.7: Scatterplot of Users Ranked by Score Description: This scatterplot displays the score and rank of users within city-level communities, sorted in descending order, revealing a noticeable Zipfian power law distribution among users with the highest scores; all data is sourced from Reddit/Pushshift (Baumgartner et al., 2020). a long right tail. Consistent with [PITH_FULL_IMAGE:figures/full_fig_p227_7_7.png] view at source ↗
Figure 7.8
Figure 7.8. Figure 7.8: Boxplot of Behavioural Measures Description: This boxplot visualises the distribution of users’ lifespan in years and engagement ratios by decile within city-level communities, illustrating that both behavioural measures follow a long right-tail distribution. The most long-lived users in the c1 decile exhibit a mean lifespan between four and six years, whereas younger cohorts from c6 onwards have a lifes… view at source ↗
Figure 7.9
Figure 7.9. Figure 7.9: Cosine Similarity of Lexical Features by Lifespan Cohort [PITH_FULL_IMAGE:figures/full_fig_p232_7_9.png] view at source ↗
Figure 7.10
Figure 7.10. Figure 7.10: Cosine Similarity of Syntactic Features by Lifespan Cohort [PITH_FULL_IMAGE:figures/full_fig_p233_7_10.png] view at source ↗
Figure 7.11
Figure 7.11. Figure 7.11: Cosine Similarity of Lexical Features by Engagement Ratio [PITH_FULL_IMAGE:figures/full_fig_p234_7_11.png] view at source ↗
Figure 7.12
Figure 7.12. Figure 7.12: Cosine Similarity of Syntactic Features by Engagement Ratio [PITH_FULL_IMAGE:figures/full_fig_p235_7_12.png] view at source ↗
Figure 7.13
Figure 7.13. Figure 7.13: Observations by Lifespan and Engagement Ratio Cohorts [PITH_FULL_IMAGE:figures/full_fig_p240_7_13.png] view at source ↗
Figure 7.14
Figure 7.14. Figure 7.14: Cosine Similarity of Lexical Features from [PITH_FULL_IMAGE:figures/full_fig_p242_7_14.png] view at source ↗
Figure 7.15
Figure 7.15. Figure 7.15: Cosine Similarity of Syntactic Features from [PITH_FULL_IMAGE:figures/full_fig_p243_7_15.png] view at source ↗
Figure 7.16
Figure 7.16. Figure 7.16: Network Graphs for rstext by Lexical Features Description: This figure displays subplots visualising network graphs for selfpost body texts (rstext) based on cosine similarity (a) and latent communities (b) using C2xG lexical features (lex), with cosine similarity below 50%. Nodes are colour-coded to represent r/newzealand in black, six city-level commu￾nities in blue, 14 peripheral communities in red, … view at source ↗
Figure 7.17
Figure 7.17. Figure 7.17: Network Graphs for rcomm by Lexical Features Description: This figure displays subplots visualising network graphs for comments (rcomm) based on cosine similarity (a) and latent communities (b) using C2xG lexical features (lex), with cosine similarity below 50%. Nodes are colour-coded to represent r/newzealand in black, six city-level communities in blue, 14 peripheral communities in red, and remaining … view at source ↗
Figure 7.18
Figure 7.18. Figure 7.18: Network Graphs for rcomm by Syntactic Features Description: This figure displays subplots visualising network graphs for comments (rcomm) based on cosine similarity (a) and latent communities (b) using C2xG syntactic features (syn), with cosine similarity below 50%. Nodes are colour-coded to represent r/newzealand in black, six city-level communities in blue, 14 peripheral communities in red, and remain… view at source ↗
Figure 7.19
Figure 7.19. Figure 7.19: Confusion Matrix from C2xG Classification Model Description: This figure displays a confusion matrix for the New Zealand-related groupings classification model, utilising SEM+ features parsed via C2xG and trained on New Zealand-related communities; all data is sourced from Reddit/Pushshift (Baumgartner et al., 2020). 218 [PITH_FULL_IMAGE:figures/full_fig_p250_7_19.png] view at source ↗
Figure 7.20
Figure 7.20. Figure 7.20: Cosine Similarity of the Hypocoristic Word Pairs [PITH_FULL_IMAGE:figures/full_fig_p256_7_20.png] view at source ↗
Figure 7.21
Figure 7.21. Figure 7.21: Semantic Shift in chippy Description: This figure depicts the semantic shift of chippy across three time periods, including chippy1 (before 2016-11), chippy5 (2021-07 to 2022-01), and chippy10 (after 2024-06), based on data sourced from Reddit/Pushshift (Baumgartner et al., 2020). 232 [PITH_FULL_IMAGE:figures/full_fig_p264_7_21.png] view at source ↗
Figure 7.22
Figure 7.22. Figure 7.22: Semantic Shift in snapper Description: This figure illustrates the semantic shift of snapper over three distinct periods, specifi￾cally snapper1 (before 2016-11), snapper5 (2021-07 to 2022-01), and snapper10 (after 2024-06), with data sourced from Reddit/Pushshift (Baumgartner et al., 2020). 233 [PITH_FULL_IMAGE:figures/full_fig_p265_7_22.png] view at source ↗
read the original abstract

This thesis investigates geographic dialect alignment in place-informed social media communities, focussing on New Zealand-related Reddit communities. By integrating qualitative analyses of user perceptions with computational methods, the study examines how language use reflects place identity and patterns of language variation and change based on user-informed lexical, morphosyntactic, and semantic variables. The findings show that users generally associate language with place, and place-related communities form a contiguous speech community, though alignment between geographic dialect communities and place-related communities remains complex. Advanced language modelling, including static and diachronic Word2Vec language embeddings, revealed semantic variation across place-based communities and meaningful semantic shifts within New Zealand English. The research involved the creation of a corpus containing 4.26 billion unprocessed words, which offers a valuable resource for future study. Overall, the results highlight the potential of social media as a natural laboratory for sociolinguistic inquiry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No circularity: empirical corpus analysis and embeddings rest on external data, not self-definition.

full rationale

The paper constructs a 4.26B-word Reddit corpus, performs qualitative perception analysis, and applies standard static/diachronic Word2Vec embeddings to measure semantic variation. No equations, fitted parameters renamed as predictions, or self-citations that bear the central load appear in the provided abstract or described methods. Claims about place-language associations derive from observed patterns in the collected data rather than reducing to inputs by construction. The derivation chain is self-contained against the external corpus and does not invoke uniqueness theorems or ansatzes from prior self-work that would force the result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that social media language use reliably indexes geographic place identity and dialect alignment; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)
  • domain assumption Language use in place-informed social media communities reflects geographic dialect alignment and place identity.
    This premise underpins the entire investigation as described in the abstract.

pith-pipeline@v0.9.0 · 5440 in / 1374 out tokens · 68169 ms · 2026-05-10T08:31:52.274803+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

293 extracted references · 186 canonical work pages · 1 internal anchor

  1. [1]

    and Gordon, E

    Abell, M. and Gordon, E. (1990). This objectionable colonial dialect': historical and contemporary attitudes to New Zealand speech. In Bell, A. and Holmes, J., editors, New Zealand Ways of Speaking English . Multilingual Matters, Clevedon, England; Bristol, PA

  2. [2]

    They Had Us In the First Half

    Adam (2019). They Had Us In the First Half

  3. [3]

    Adams, N. N. (2022). ' Scraping ' Reddit posts for academic research? Addressing some blurred lines of consent in growing internet-based research trend during the time of COVID -19. International journal of social research methodology , 27(1). https://doi.org/10.1080/13645579.2022.2111816

  4. [4]

    Agha, A. (2003). The social life of cultural value. Language & Communication , 23(3):231--273. https://doi.org/10.1016/S0271-5309(03)00012-0

  5. [5]

    Agnew, J. A. (1987). Place and Politics : The Geographical Mediation of State and Society , volume 1 of Routeledge Library Editions : Political Geography . Routledge, Abingdon, England; New York, NY, 3 edition

  6. [6]

    Ainsworth, H. (2004). Regional Variation in New Zealand English : the Taranaki Sing - Song Accent . Retrieved from https://doi.org/10.26686/wgtn.16945720.v1

  7. [7]

    Amaya, A., Bach, R., Keusch, F., and Kreuter, F. (2021). New Data Sources in Social Science Research : Things to Know Before Working With Reddit Data . Social Science Computer Review , 39(5):943--960. https://doi.org/10.1177/0894439319893305

  8. [8]

    and Ziegler, E

    Androutsopoulos, J. and Ziegler, E. (2004). Exploring language variation on the Internet : Regional speech in a chat community. In Language variation in Europe : papers from the second international conference on language variation in Europe , ICLaVE , volume 2, pages 99--111

  9. [9]

    Antonakaki, D., Fragopoulou, P., and Ioannidis, S. (2021). A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications , 164:114006. https://doi.org/10.1016/j.eswa.2020.114006

  10. [10]

    Nek Minute

    AxstaBludsta (2011). Nek Minute . Retrieved from https://youtu.be/CTZyorJVeqI

  11. [11]

    Ballantyne, T. (2011). On Place , Space and Mobility in Nineteenth - Century New Zealand . New Zealand Journal of History , 45(1):50--70. Retrieved from https://muse.jhu.edu/article/879358/

  12. [12]

    Ballard, E., Charters, H., Meyerhoff, M., and Watson, C. (2025). New Zealand , Multicultural Auckland English . In The Wiley Blackwell Encyclopedia of World Englishes , pages 1--10. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/9781119518297.eowe00118

  13. [13]

    Bamman, D., Dyer, C., and Smith, N. A. (2014). Distributed Representations of Geographically Situated Language . In Toutanova, K. and Wu, H., editors, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics ( Volume 2: Short Papers ) , pages 828--834, Baltimore, Maryland. Association for Computational Linguistics

  14. [14]

    L., Beal, J

    Barber, C. L., Beal, J. C., and Shaw, P. A. (1993). The English Language : A Historical Introduction . Cambridge University Press, New York, NY, 2 edition

  15. [15]

    Bardsley, D. (2006). A Specialist Study in New Zealand English Lexis : The Rural Sector . International Journal of Lexicography , 19(1):41--72. https://doi.org/10.1093/ijl/eci052

  16. [16]

    Bardsley, D. (2009). Lexicography in New Zealand . Technical report, New Zealand Dictionary Centre, Wellington, New Zealand

  17. [17]

    and Simpson, J

    Bardsley, D. and Simpson, J. (2009). Hypocoristics in New Zealand and Australian English . In Peters, P., Collins, P., and Smith, A., editors, Comparative Studies in Australian and New Zealand English : Grammar and beyond , pages 49--70. John Benjamins Publishing Company, Amsterdam, The Netherlands; Philadelphia, PA. https://doi.org/10.1075/veaw.g39.04bar

  18. [18]

    Bartlett, C. M. (1992). Regional Variation in New Zealand English : The Case of Southland . New Zealand English Newsletter , 6:5--15

  19. [19]

    Bauer, L. (1987). New Zealand English morphology: Some experimental evidence. Te Reo – The Journal of the Linguistic Society of New Zealand , 30(1):37--53. Retrieved from https://nzlingsoc.org/journal\_article/new-zealand-english-morphology-some-experimental-evidence/

  20. [20]

    Bauer, L. (1994a). Introducing the Wellington Corpus of Written New Zealand English . Te Reo – The Journal of the Linguistic Society of New Zealand , 37:21--28. Retrieved from https://nzlingsoc.org/journal\_article/introducing-the-wellington-corpus-of-written-new-zealand-english/

  21. [21]

    Bauer, L. (1994b). Watching English Change : An Introduction to the Study of Linguistic Change in Standard Englishes in the 20th Century . Longman, Harlow, England. https://doi.org/10.4324 /9781315844169

  22. [22]

    Bauer, L. (2007). Some Grammatical Features of New Zealand English . New Zealand English Journal , 21:1--25. Retrieved from https://search.informit.org/doi/10.3316/informit.555852519597653

  23. [23]

    and Bauer, W

    Bauer, L. and Bauer, W. (2002). Can we watch regional dialects developing in colonial English ?: The case of New Zealand . English World-Wide , 23(2):169--193. https://doi.org/10.1075/eww.23.2.02bau

  24. [24]

    and Bauer, W

    Bauer, L. and Bauer, W. (2003). Playground Talk : Dialects and Change in New Zealand English . School of Linguistics and Applied Language Studies, Victoria University of Wellington, Wellington, New Zealand

  25. [25]

    Baum, L. E. and Petrie, T. (1966). Statistical Inference for Probabilistic Functions of Finite State Markov Chains . The Annals of Mathematical Statistics , 37(6):1554--1563. Retrieved from https://www.jstor.org/stable/2238772

  26. [26]

    Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., and Blackburn, J. (2020). The Pushshift Reddit Dataset . In Proceedings of the International AAAI Conference on Web and Social Media , volume 14, pages 830--839, Atlanta, GA. PKP Publishing Services Network. https://doi.org/10.1609/icwsm.v14i1.7347

  27. [27]

    Bayard, D. (1989). ‘ Me Say That ? No Way !': The social correlates of American lexical diffusion in New Zealand English . Te Reo , 32(1):17--60. Retrieved from https://nzlingsoc.org/journal\_article/me-say-that-no-way-the-social-correlates-of-american-lexical-diffusion-in-new-zealand-english/

  28. [28]

    Bayard, D. (1991). Antipodean Accents and the `` Cultural Cringe '': New Zealand and American Attitudes Toward NZE and Other English Accents . Te Reo , 34(1):15--52. Retrieved from https://nzlingsoc.org/journal\_article/antipodean-accents-and-the-cultural-cringe-new-zealand-and-american-attitudes-toward-nze-and-other-english-accents/

  29. [29]

    Beaman, K. V. (2021). Identity and mobility in linguistic change across the lifespan: The case of Swabian German . In Ziegler, A., Edler, S., and Oberdorfer, G., editors, Urban Matters : Current approaches in variationist sociolinguistics , pages 27--60. John Benjamins Publishing Company, Amsterdam, The Netherlands; Philadelphia, PA. https://doi.org/10.10...

  30. [30]

    Bell, C. (1996). Inventing New Zealand : Everyday Myths of Pakeha Identity . Penguin Books, Harmondsworth, England

  31. [31]

    Berger, P. L. and Luckmann, T. (1966). The Social Construction of Reality : A Treatise in the Sociology of Knowledge . Anchor, London, England; New York, NY; Camberwell, Victoria, Australia; Toronto, Ontario, Canada; New Delhi, India; Auckland, New Zealand; Rosebank, South Africa

  32. [32]

    and Conrad, S

    Biber, D. and Conrad, S. (2009). Register, Genre , and Style . Cambridge Textbooks in Linguistics . Cambridge University Press, Cambridge, England. https://doi.org/10.1017/CBO9780511814358

  33. [33]

    Birznieks, L. (2020). The Perpetuation of Western Dominance through Online Discourse : A Critical Discourse Analysis of Reddit Comment Threads . Retrieved from https://studenttheses.uu.nl/handle/20.500.12932/36478

  34. [34]

    Blaschke, V., Purschke, C., Schuetze, H., and Plank, B. (2024). What Do Dialect Speakers Want ? A Survey of Attitudes Towards Language Technology for German Dialects . In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics ( Volume 2: Short Papers ) , pages 823--841, Ba...

  35. [35]

    Blommaert, J. (2013). Writing as a sociolinguistic object. Journal of Sociolinguistics , 17(4):440--459. https://doi.org/10.1111/josl.12042

  36. [36]

    Fast Unfolding of Communities in Large Networks

    Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment , 2008(10):P10008. https://dx.doi.org/10.1088/1742-5468/2008/10/P10008

  37. [37]

    Bloomfield, L. (1933). Language . George Allen & Unwin, London, England

  38. [38]

    lacking open API access

    Braun, V. and Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology , 3(2):77--101. https://doi.org/10.1191/1478088706qp063oa

  39. [39]

    British Oracle (2019). Savvy B . Retrieved from http://savvy-b.urbanup.com/14111231

  40. [40]

    Bucher, T. (2013). Objects of Intense Feeling : The Case of the Twitter API . Computational Culture , 3. Retrieved from http://computationalculture.net/objects-of-intense-feeling-the-case-of-the-twitter-api/

  41. [41]

    Burrows, J. (2002). ‘ Delta ’: a Measure of Stylistic Difference and a Guide to Likely Authorship . Literary & Linguistic Computing , 17(3):267--287. https://doi.org/10.1093/llc/17.3.267

  42. [42]

    Cabitza, F., Campagner, A., and Basile, V. (2023). Toward a Perspectivist Turn in Ground Truthing for Predictive Computing . Proceedings of the AAAI Conference on Artificial Intelligence , 37(6):6860--6868

  43. [43]

    Calude, A. S. (2023). The Linguistics of Social Media : An Introduction . Taylor & Francis, Abingdon, England; New York, NY. https://doi.org/10.4324/9781003321873

  44. [44]

    S., Long, M., and Burnette, J

    Calude, A. S., Long, M., and Burnette, J. (2024). \# AreHashtagsWords ? Structure , position, and syntactic integration of hashtags in ( English ) tweets. Linguistics Vanguard , 10(1):105--114. https://doi.org/10.1515/lingvan-2023-0044

  45. [45]

    Cannon, G. (1985). Functional shift in English . Linguistics , 23(3):411--432. https://doi.org/10.1515/ling.1985.23.3.411

  46. [46]

    Cannon, G. (1986). Blends in English word formation. Linguistics , 24(4):725--754. https://doi.org/10.1515/ling.1986.24.4.725

  47. [47]

    Cannon, G. (1989). Abbreviations and Acronyms in English Word - Formation . American Speech , 64(2):99--127. https://doi.org/10.2307/455038

  48. [48]

    and Bailey, G

    Cannon, G. and Bailey, G. (1986). Back- Formations in English Word - Formation . Meta , 31(4):427--438

  49. [49]

    Carmichael, K. (2023). Locating place in variationist sociolinguistics: Making the case for ethnographically informed multidimensional place orientation metrics. Journal of Linguistic Geography , 11(2):65--77. https://doi.org/10.1017/jlg.2023.2

  50. [50]

    and Reed, P

    Carmichael, K. and Reed, P. E. (2025). Language and Place . Cambridge University Press, Cambridge, England; New York, NY; Port Melbourne, Victoria, Australia; New Delhi, India; Singapore, Singapore. https://doi.org/10.1017/9781009380874

  51. [51]

    Chambers, J. K. (2000). Region and language variation. English World-Wide , 21(2):169--199. https://doi.org/10.1075/eww.21.2.02cha

  52. [52]

    Chambers, J. K. and Trudgill, P. (1998). Dialectology . Cambridge Textbooks in Linguistics . Cambridge University Press, Cambridge, England, 2 edition. https://doi.org/10.1017/CBO9780511805103

  53. [53]

    Charmaz, K. (2006). Constructing Grounded Theory : A Practical Guide through Qualitative Analysis . Introducing Qualitative Methods . SAGE Publications, London, England; Thousand Oaks, CA; New Delhi, India

  54. [54]

    Chomsky, N. (1965). Aspects of the Theory of Syntax . The MIT Press, Cambridge, MA

  55. [55]

    Church, K. W. and Hanks, P. (1990). Word Association Norms , Mutual Information , and Lexicography . Computational Linguistics , 16(1):22--29. Retrieved from https://aclanthology.org/J90-1003/

  56. [56]

    Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020). Unsupervised Cross -lingual Representation Learning at Scale . In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguist...

  57. [57]

    Coupland, N. (2001). Introduction: Sociolinguistic Theory and Social Theory . In Coupland, N., Sarangi, S., and Candlin, C. N., editors, Sociolinguistics and Social Theory , Language in Social Life Series , pages 1--26. Longman, Harlow, England

  58. [58]

    and Nepusz, T

    Csárdi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems , 1695:1--9. Retrieved from https://igraph.org/

  59. [59]

    Cuming, A. (2013). Marmageddon no more. Waikato Times . Retrieved from https://www.stuff.co.nz/waikato-times/news/8450129/Marmageddon-no-more

  60. [60]

    Cutler, C. (2020). Metapragmatic comments and orthographic performances of a New York accent on YouTube . World Englishes , 39(1):36--53. https://doi.org/10.1111/weng.12444

  61. [62]

    Cutler, C., Ahmar, M., and Bahri, S. (2022b). Introduction: The Oralization of Digital Written Communication . In Cutler, C., Ahmar, M., and Bahri, S., editors, Digital Orality : Vernacular Writing in Online Spaces , pages 3--31. Springer International Publishing, Cham, Switzerland. https://doi.org/10.1007/978-3-031-10433-6\_1

  62. [63]

    Danescu-Niculescu-Mizil, C., Gamon, M., and Dumais, S. (2011). Mark my words! linguistic style accommodation in social media. In Proceedings of the 20th international conference on World wide web , WWW '11, pages 745--754, New York, NY, USA. Association for Computing Machinery. https://doi.org/10.1145/1963405.1963509

  63. [64]

    De Bres, J. (2010). Attitudes of non- Maori New Zealanders towards the use of Maori in New Zealand English . New Zealand English Journal , 24:2--14. Retrieved from https://search.informit.org/doi/10.3316/informit.206778324257255

  64. [65]

    and Nicholas, S

    de Bres, J. and Nicholas, S. A. (2021). The sexiest accent in the world: Linguistic insecurity and prejudice in media coverage of the New Zealand accent. Te Reo – The Journal of the Linguistic Society of New Zealand , 64(1):15--32. Retrieved from https://nzlingsoc.org/journal\_article/the-sexiest-accent-in-the-world-linguistic-insecurity-and-prejudice-in-...

  65. [66]

    Degani, M. (2012). Language contact in New Zealand : A focus on English lexical borrowings in Māori . Academic Journal of Modern Philology , 1:13--24

  66. [67]

    Desmarais, A.-M. (2020). Men who knit: A social media critical discourse study ( SM - CDS ) on the legitimisation of men within Reddit ’sr/knitting community. Retrieved from https://hdl.handle.net/10292/13594

  67. [68]

    Deverson, T. (2000). Handling New Zealand English lexis. In Bell, A. and Kuiper, K., editors, New Zealand English , pages 23--39. Victoria University Press, Amsterdam, The Netherlands; Philadelphia, PA

  68. [74]

    and Kennedy, G

    Deverson, T. and Kennedy, G. (2005f). ute. In The New Zealand Oxford Dictionary . Oxford University Press. https://doi.org/10.1093/acref/9780195584516.001.0001

  69. [75]

    Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT : Pre -training of Deep Bidirectional Transformers for Language Understanding . In Burstein, J., Doran, C., and Solorio, T., editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies , volume 1 of...

  70. [76]

    and Williams, E

    Di Sciullo, A.-M. and Williams, E. (1987). On the definition of word . Number 14 in Linguistic inquiry monographs. The MIT Press, Cambridge, MA

  71. [77]

    Dijkstra, J., Heeringa, W., Jongbloed-Faber, L., and Van de Velde, H. (2021). Using Twitter Data for the Study of Language Change in Low - Resource Languages . A Panel Study of Relative Pronouns in Frisian . Frontiers in Artificial Intelligence , 4. https://doi.org/10.3389/frai.2021.644554

  72. [78]

    Donald, S. (2018). It's a colonial thing: New Zealand cultural identity and the use of 'colony' as a social category in intercultural communication. New Zealand Studies in Applied Linguistics , 24(1):5--17. Retrieved from https://search.informit.org/doi/abs/10.3316/INFORMIT.740485668341927

  73. [79]

    Dourish, P. (2004). What we talk about when we talk about context. Personal and Ubiquitous Computing , 8(1):19--30. https://doi.org/10.1007/s00779-003-0253-8

  74. [80]

    place”’ and “space

    Dourish, P. (2006). Re-space-ing place: "place" and "space" ten years on. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work , pages 299--308, New York, NY. Association for Computing Machinery. https://doi.org/10.1145/1180875.1180921

  75. [81]

    and Meyerhoff, M

    Duhamel, M.-F. and Meyerhoff, M. (2015). An end of egalitarianism? Social evaluations of language difference in New Zealand . Linguistics Vanguard , 1(1):235--248. https://doi.org/10.1515/lingvan-2014-1005

  76. [82]

    Dunn, J. (2017). Computational learning of construction grammars. Language and Cognition , 9(2):254--292. https://doi.org/10.1017/langcog.2016.7

  77. [83]

    Dunn, J. (2019a). Global Syntactic Variation in Seven Languages : Toward a Computational Dialectology . Frontiers in Artificial Intelligence , 2:15. https://doi.org/10.3389/frai.2019.00015

  78. [84]

    Dunn, J. (2019b). Modeling Global Syntactic Variation in English Using Dialect Classification . In Zampieri, M., Nakov, P., Malmasi, S., Ljubešić, N., Tiedemann, J., and Ali, A., editors, Proceedings of the Sixth Workshop on NLP for Similar Languages , Varieties and Dialects , pages 42--53, Ann Arbor, Michigan. Association for Computational Linguistics. h...

  79. [85]

    Dunn, J. (2020). Mapping languages: the Corpus of Global Language Use . Language Resources and Evaluation , 54(4):999--1018. https://doi.org/10.1007/s10579-020-09489-2

  80. [86]

    Dunn, J. (2022). Natural Language Processing for Corpus Linguistics . Cambridge University Press, Cambridge, England; New York, NY; Port Melbourne, Victoria, Australia; New Delhi, India; Singapore, Singapore. https://doi.org/10.1017/9781009070447

Showing first 80 references.