pith. machine review for the scientific record. sign in

arxiv: 2605.04371 · v1 · submitted 2026-05-06 · 💻 cs.SI

Recognition: unknown

Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:00 UTC · model grok-4.3

classification 💻 cs.SI
keywords time zone inferenceredditcircadian rhythmsonline communitiestemporal activitygeographic spreadactivity timestamps
0
0 comments X

The pith

Reddit community time zones can be inferred from activity timestamps alone to sub-30-minute accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that time zones of online communities can be accurately inferred solely from their hourly activity counts by using the circadian pattern where activity is lowest around 4 a.m. local time. This approach matters because it provides a scalable, privacy-preserving way to measure the geographic spread of global online platforms like Reddit without requiring user location data or surveys. The authors demonstrate that even a simple heuristic performs nearly as well as more sophisticated time-domain and frequency-domain methods, achieving accuracy within one hour on average, and that peak performance is reached with fewer than 1,000 comments. They validate this by showing correlation with known geographical distributions and apply it to track Reddit's evolution over two decades across various community types.

Core claim

We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, the best-performing method is accurate to a sub-30-minute resolution, and fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within

What carries the argument

The circadian minimum heuristic that assumes the lowest activity occurs around 4 a.m. local time to determine the time zone offset from observed hourly activity counts.

Load-bearing premise

The assumption that community activity reaches its minimum around 4 a.m. local time holds across different cultures and community types.

What would settle it

Finding a community where the activity minimum occurs at a significantly different local hour, causing the inferred time zone to be off by more than one hour.

Figures

Figures reproduced from arXiv: 2605.04371 by Franco Della Negra, Matteo Cinelli, Mattia Samory.

Figure 1
Figure 1. Figure 1: Overview of the processing pipeline for the six inference methods compared in this work. view at source ↗
Figure 2
Figure 2. Figure 2: Time-domain feature distributions. (Left) Hourly activity distributions across different UTC offsets. view at source ↗
Figure 3
Figure 3. Figure 3: Frequency-domain feature distributions. (Left) The wavelet power spectrum appears noisy, in time-zone view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrices of actual vs. inferred UTC offsets. A strong diagonalization of the predictions view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of absolute circular errors. The time-domain reference methods maintain tight error view at source ↗
Figure 6
Figure 6. Figure 6: Geographic distribution of time zone inference errors. High-error nodes (yellow/light green) are not view at source ↗
Figure 7
Figure 7. Figure 7: Inference reliability relative to data volume. Time-domain representations (Activity Counts and Activity view at source ↗
Figure 8
Figure 8. Figure 8: Inferred UTC offsets for subreddits in the Sports category. view at source ↗
Figure 9
Figure 9. Figure 9: Longitudinal distribution of Reddit activity (2009–2024). The density curves demonstrate a historical view at source ↗
Figure 10
Figure 10. Figure 10: Longitudinal distribution of Reddit activity per subreddit topic. Topics like Technology, Place, and view at source ↗
Figure 11
Figure 11. Figure 11: Evolution of Reddit’s Geographic Dispersion (2005–2024). The Gini Coefficient, calculated on the view at source ↗
Figure 12
Figure 12. Figure 12: Heatmap of the Log2 Fold Change in subreddit volume (Base Year = 2012). Following the 2011–2013 view at source ↗
Figure 13
Figure 13. Figure 13: Degradation of inference accuracy in relation to data scarcity. The left panel illustrates accuracy as a view at source ↗
Figure 14
Figure 14. Figure 14: Inferred UTC offsets mapped to semantic categories. The distribution accurately reflects real-world view at source ↗
Figure 15
Figure 15. Figure 15: Inferred UTC offsets for subreddits in the Business, Economics, and Finance category. view at source ↗
Figure 16
Figure 16. Figure 16: Inferred UTC offsets for subreddits in the Food and Drink category. view at source ↗
read the original abstract

Online communities are a global phenomenon, but assessing their actual geographical spread requires accurate and scalable measurement. We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, we show that the best-performing method is accurate to a sub-30-minute resolution, and that fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within a one-hour margin on average. This simple method correlates significantly with the actual distribution of Reddit's geographical spread; we validate its generalizability across communities organized around diverse cultural phenomena, from sports to finance, and apply it at scale to characterize the geographic evolution of Reddit from its founding to the present. Our method is portable across platforms and requires no user disclosure, making it a practical baseline for any study that must account for the geographic structure of online behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents methods to infer the time zone of online communities, particularly on Reddit, using only their hourly activity timestamps. Grounded in the circadian rhythm where activity minima occur around 4 a.m. local time, the authors compare a simple heuristic with more complex time- and frequency-domain approaches. They report that the best method achieves sub-30-minute accuracy, that performance saturates with fewer than 1000 comments, and that the heuristic recovers time zones within one hour on average. The approach is validated against geographic distributions, shown to generalize across diverse communities, and applied to track Reddit's geographic evolution over 20 years.

Significance. If the results hold, this work offers a practical, scalable, and privacy-preserving tool for measuring the geographic structure of online platforms. It enables large-scale studies of globalization in digital communities without requiring user-level data or platform cooperation. The finding that a simple heuristic nearly matches complex methods is particularly useful for broad adoption in social computing research.

major comments (1)
  1. [Abstract] The accuracy claims, including sub-30-minute resolution and the one-hour margin for the heuristic, depend critically on the assumption that the activity minimum is stably located at 4 a.m. local time across all tested communities. The manuscript states this is 'well-established' but provides no quantitative validation or sensitivity analysis showing that the minimum remains within, e.g., ±1 hour for the diverse communities tested (sports, finance, non-Western users, or post-2015 periods). If the true minimum deviates systematically, all reported error metrics would be offset by that amount, undermining the correlation with actual geographic spread.
minor comments (1)
  1. [Abstract] The abstract could more explicitly state the number of communities or subreddits analyzed and the exact performance curve details supporting the saturation claim at under 1000 comments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address the major comment on the activity minimum assumption below and will revise the manuscript to incorporate additional analysis as outlined.

read point-by-point responses
  1. Referee: [Abstract] The accuracy claims, including sub-30-minute resolution and the one-hour margin for the heuristic, depend critically on the assumption that the activity minimum is stably located at 4 a.m. local time across all tested communities. The manuscript states this is 'well-established' but provides no quantitative validation or sensitivity analysis showing that the minimum remains within, e.g., ±1 hour for the diverse communities tested (sports, finance, non-Western users, or post-2015 periods). If the true minimum deviates systematically, all reported error metrics would be offset by that amount, undermining the correlation with actual geographic spread.

    Authors: We appreciate the referee highlighting this foundational assumption. The 4 a.m. minimum draws from established circadian rhythm literature, which we cite and describe in the manuscript. We acknowledge that the original submission does not contain a dedicated quantitative sensitivity analysis varying the minimum location across the specific community subsets mentioned. However, our accuracy claims are not solely dependent on the assumption in isolation; they are empirically validated by comparing inferred time zones against the actual geographic distributions of Reddit users. Systematic deviations from 4 a.m. would produce mismatches in this geographic correlation, which our results do not show. To directly respond to the concern, we will add a sensitivity analysis in the revised manuscript. This will test shifts in the assumed minimum (e.g., 3:00 a.m. to 5:00 a.m. in 30-minute steps) and report effects on the sub-30-minute accuracy and one-hour heuristic margin, broken down by community categories including sports, finance, and post-2015 temporal subsets. We will also examine non-Western communities using available subreddit data where sample sizes permit. This addition will quantify robustness and address potential offsets explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; inference validated against independent geographic benchmarks

full rationale

The paper's core pipeline aligns observed activity histograms to a 4 a.m. local minimum drawn from an external, well-established circadian literature finding. Accuracy, saturation at <1000 comments, and geographic-evolution maps are then measured by direct correlation against independent Reddit geographic-spread data and cross-community validation (sports, finance, etc.). No equations, fitted parameters, or self-citations are shown to reduce the reported error metrics or time-zone outputs to the input assumption by construction. The derivation remains externally falsifiable and does not collapse into self-definition or tautological renaming.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption of circadian posting rhythms and the specific choice of a 4 a.m. activity minimum; no free parameters are explicitly fitted beyond this heuristic, and no new entities are introduced.

free parameters (1)
  • 4 a.m. activity minimum
    The heuristic assumes the daily activity minimum occurs around 4 a.m. local time as a fixed reference point.
axioms (1)
  • domain assumption posting rhythms encode circadian structure
    The approach is explicitly grounded in this well-established finding from the abstract.

pith-pipeline@v0.9.0 · 5510 in / 1440 out tokens · 87761 ms · 2026-05-08T17:00:30.532169+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 19 canonical work pages · 1 internal anchor

  1. [1]

    Alarfaj, J

    L. Alarfaj, J. Blackburn, M. Amjad, J. Patel, and Z. Ertem. 2025. Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation.Information16, 9 (2025), 748. doi:10.3390/ info16090748

  2. [2]

    Michele Avalle, Niccolò Di Marco, Gabriele Etta, Emanuele Sangiorgio, Shayan Alipour, Anita Bonetti, Lorenzo Alvisi, Antonio Scala, Andrea Baronchelli, Matteo Cinelli, et al . 2024. Persistent interaction patterns across social media platforms and over time.Nature628, 8008 (2024), 582–589

  3. [3]

    Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky

    Christopher Bail, Lisa Argyle, Taylor Brown, John Bumpus, Haohan Chen, M.B. Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky. 2018. Exposure to opposing views can increase political polarization: evidence from a large-scale field experiment on social media.SocArXivXXX (2018), 1–6. arXiv:1408.1149 doi:10.17605/ osf.io/4ygux

  4. [4]

    Duilio Balsamo, Paolo Bajardi, and André Panisson. 2019. Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort.The World Wide Web Conference on - WWW ’19(2019), 2572–2579. arXiv:1904.00003 doi:10.1145/3308558.3313634

  5. [5]

    Berragan, A

    C. Berragan, A. Singleton, A. Calafiore, and J. Morley. 2023. Evaluating the similarity of location-based corpora identified in Reddit comments. InProceedings of the First Workshop on Geographic Information Extraction from Texts (GeoExT 2023) co-located with ECIR 2023 (CEUR Workshop Proceedings, Vol. 3385). 1–6

  6. [6]

    Lia Bozarth, Daniele Quercia, Licia Capra, and Sanja Šćepanović. 2023. The role of the big geographic sort in online news circulation among U.S. Reddit users.Scientific Reports13, 1 (2023), 6711. doi:10.1038/s41598-023-33247-3

  7. [7]

    Stevie Chancellor and Munmun De Choudhury. 2020. Methods in predictive techniques for mental health status on social media: a critical review.npj Digital Medicine3 (2020). doi:10.1038/s41746-020-0233-7

  8. [8]

    Bo-Chiuan Chen, Dong-Chul Seo, Hsien-Chang Lin, and David Crandall. 2018. Framework for estimating sleep timing from digital footprints.BMJ Innovations4, 4 (2018). doi:10.1136/bmjinnov-2018-000270

  9. [9]

    Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Predicting postpartum changes in emotion and behavior via social media.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI ’13(2013), 3267. doi:10.1145/2470654.2466447

  10. [10]

    Jacob Eisenstein. 2018. Identifying Regional Dialects in On-Line Social Media.The Handbook of Dialectology2013 (2018), 368–383. doi:10.1002/9781118827628.ch21

  11. [11]

    Participant

    Casey Fiesler and Nicholas Proferes. 2018. “Participant” Perceptions of Twitter Research Ethics.Social Media and Society4 (2018). doi:10.1177/2056305118763366

  12. [12]

    Kambiz Ghoorchian and Sarunas Girdzijauskas. 2018. Spatio-Temporal Multiple Geo-Location Identification on Twitter. In2018 IEEE International Conference on Big Data. 1895–1902. doi:10.1109/BigData.2018.8622131

  13. [13]

    Keith Harrigian. 2018. Geocoding without Geotags: A Text-Based Approach for Reddit. arXiv:arXiv:1810.03067 arXiv preprint

  14. [14]

    Here, There and Everywhere

    M. Hoffmann and A. Heft. 2020. “Here, There and Everywhere”: Classifying Location Information in Social Media Data — Possibilities and Limitations.Communication Methods and Measures14, 3 (2020), 184–203. doi:10.1080/19312458.2019. 1708282

  15. [15]

    S Rao Jammalamadaka and Y Ramakrishna Sarma. 1988. A correlation coefficient for angular variables.Statistical theory and data analysis II(1988), 349–364

  16. [16]

    Sean Kates, Joshua Tucker, Jonathan Nagler, and Richard Bonneau. 2021. The Times They Are Rarely A-Changin’: Circadian Regularities in Social Media Use.Journal of Quantitative Description: Digital Media1 (2021). doi:10.51685/ jqd.2021.017

  17. [17]

    Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, Simone Raponi, and Julinda Stefa. 2019. Nationality and geolocation-based profiling in the dark (web).IEEE Transactions on Services Computing15, 1 (2019), 429–441. , Vol. 1, No. 1, Article . Publication date: May 2026. 26 Della Negra et al

  18. [18]

    Mahmud, J

    J. Mahmud, J. Nichols, and C. Drews. 2012. Where Is This Tweet From? Inferring Home Locations of Twitter Users. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 6. 228–235

  19. [19]

    Mahmud, J

    J. Mahmud, J. Nichols, and C. Drews. 2014. Home Location Identification of Twitter Users. arXiv:arXiv:1403.2345 arXiv preprint

  20. [20]

    Meyerson, S

    W. Meyerson, S. Fineberg, Y. Song, A. Faber, G. Ash, F. Andrade, P. Corlett, M. Gerstein, and R. Hoyle. 2023. Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys.JMIR Formative Research7 (2023), e38112. doi:10.2196/38112

  21. [21]

    Corrado Monti, Jacopo D’Ignazi, Michele Starnini, and Gianmarco De Francisci Morales. 2023. Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit.arXiv(2023). arXiv:2302.07598 doi:10.48550/arxiv. 2302.07598

  22. [22]

    Morales, V

    A. Morales, V. Vavilala, Rosa M. Benito, and Y. Bar-Yam. 2017. Global patterns of synchronization in human communi- cations.Journal of The Royal Society Interface14 (2017). doi:10.1098/rsif.2016.1048

  23. [23]

    Marcos Oliveira, Eraldo Ribeiro, Carmelo Bastos-filho, and Ronaldo Menezes. [n. d.]. Spatio-temporal variations in the urban rhythm : the travelling waves of crime. ([n. d.]). arXiv:1807.02989

  24. [24]

    Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, and Fred Morstatter

    Juergen Pfeffer, Daniel Matter, Kokil Jaidka, Onur Varol, Afra Jahanbakhsh Mashhadi, Jana Lasser, Dennis Assenmacher, Siqi Wu, Diyi Yang, Cornelia Brantner, Daniel M. Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, and Fred Morstatter. 2023. Just Another Day on Twitter: A Complete 24 Hours of Twitter Data. In International Conf...

  25. [25]

    Tiziano Piccardi, Martin Gerlach, and Robert West. 2024. Curious rhythms: Temporal regularities of wikipedia consumption. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 18. 1249–1261

  26. [26]

    1993.The Virtual Community: Homesteading on the Electronic Frontier

    Howard Rheingold. 1993.The Virtual Community: Homesteading on the Electronic Frontier. Addison-Wesley

  27. [27]

    Scheffler and C

    T. Scheffler and C. Kyba. 2016. Measuring Social Jetlag in Twitter Data. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 10. 675–678. doi:10.1609/icwsm.v10i1.14789

  28. [28]

    Mila Stillman and Anna M. Kruspe. 2024. Geolocation Extraction From Reddit Text Data. InGeoExT@ECIR

  29. [29]

    stuck_in_the_matrix, Watchful1, and RaiderBDev. [n. d.]. Reddit comments/submissions 2005-06 to 2024-12. ([n. d.])

  30. [30]

    Anna Tigunova, Paramita Mirza, Andrew Yates, and Weikum, and Gerhard. 2020. RedDust: a Large Reusable Dataset of Reddit User Traits. InProceedings of the Twelfth Language Resources and Evaluation Conference. https://aclanthology. org/2020.lrec-1.751/

  31. [31]

    1995.Life on the Screen: Identity in the Age of the Internet

    Sherry Turkle. 1995.Life on the Screen: Identity in the Age of the Internet. Simon & Schuster

  32. [32]

    Isaac Waller and Ashton Anderson. 2021. Quantifying social organization and political polarization in online platforms. Nature600 (2021), 264–268. arXiv:2010.00590 doi:10.1038/s41586-021-04167-x

  33. [33]

    Constantinides, D

    Ke Zhou, M. Constantinides, D. Quercia, and S. Şćepanović. 2023. How Circadian Rhythms Extracted from Social Media Relate to Physical Activity and Sleep.Proceedings of the International AAAI Conference on Web and Social Media 17, 1 (2023), 948–959. doi:10.1609/icwsm.v17i1.22202 A Population Location Deconvolution Assigning a subreddit’s entire user base e...