Recognition: unknown
Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps
Pith reviewed 2026-05-08 17:00 UTC · model grok-4.3
The pith
Reddit community time zones can be inferred from activity timestamps alone to sub-30-minute accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, the best-performing method is accurate to a sub-30-minute resolution, and fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within
What carries the argument
The circadian minimum heuristic that assumes the lowest activity occurs around 4 a.m. local time to determine the time zone offset from observed hourly activity counts.
Load-bearing premise
The assumption that community activity reaches its minimum around 4 a.m. local time holds across different cultures and community types.
What would settle it
Finding a community where the activity minimum occurs at a significantly different local hour, causing the inferred time zone to be off by more than one hour.
Figures
read the original abstract
Online communities are a global phenomenon, but assessing their actual geographical spread requires accurate and scalable measurement. We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, we show that the best-performing method is accurate to a sub-30-minute resolution, and that fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within a one-hour margin on average. This simple method correlates significantly with the actual distribution of Reddit's geographical spread; we validate its generalizability across communities organized around diverse cultural phenomena, from sports to finance, and apply it at scale to characterize the geographic evolution of Reddit from its founding to the present. Our method is portable across platforms and requires no user disclosure, making it a practical baseline for any study that must account for the geographic structure of online behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents methods to infer the time zone of online communities, particularly on Reddit, using only their hourly activity timestamps. Grounded in the circadian rhythm where activity minima occur around 4 a.m. local time, the authors compare a simple heuristic with more complex time- and frequency-domain approaches. They report that the best method achieves sub-30-minute accuracy, that performance saturates with fewer than 1000 comments, and that the heuristic recovers time zones within one hour on average. The approach is validated against geographic distributions, shown to generalize across diverse communities, and applied to track Reddit's geographic evolution over 20 years.
Significance. If the results hold, this work offers a practical, scalable, and privacy-preserving tool for measuring the geographic structure of online platforms. It enables large-scale studies of globalization in digital communities without requiring user-level data or platform cooperation. The finding that a simple heuristic nearly matches complex methods is particularly useful for broad adoption in social computing research.
major comments (1)
- [Abstract] The accuracy claims, including sub-30-minute resolution and the one-hour margin for the heuristic, depend critically on the assumption that the activity minimum is stably located at 4 a.m. local time across all tested communities. The manuscript states this is 'well-established' but provides no quantitative validation or sensitivity analysis showing that the minimum remains within, e.g., ±1 hour for the diverse communities tested (sports, finance, non-Western users, or post-2015 periods). If the true minimum deviates systematically, all reported error metrics would be offset by that amount, undermining the correlation with actual geographic spread.
minor comments (1)
- [Abstract] The abstract could more explicitly state the number of communities or subreddits analyzed and the exact performance curve details supporting the saturation claim at under 1000 comments.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address the major comment on the activity minimum assumption below and will revise the manuscript to incorporate additional analysis as outlined.
read point-by-point responses
-
Referee: [Abstract] The accuracy claims, including sub-30-minute resolution and the one-hour margin for the heuristic, depend critically on the assumption that the activity minimum is stably located at 4 a.m. local time across all tested communities. The manuscript states this is 'well-established' but provides no quantitative validation or sensitivity analysis showing that the minimum remains within, e.g., ±1 hour for the diverse communities tested (sports, finance, non-Western users, or post-2015 periods). If the true minimum deviates systematically, all reported error metrics would be offset by that amount, undermining the correlation with actual geographic spread.
Authors: We appreciate the referee highlighting this foundational assumption. The 4 a.m. minimum draws from established circadian rhythm literature, which we cite and describe in the manuscript. We acknowledge that the original submission does not contain a dedicated quantitative sensitivity analysis varying the minimum location across the specific community subsets mentioned. However, our accuracy claims are not solely dependent on the assumption in isolation; they are empirically validated by comparing inferred time zones against the actual geographic distributions of Reddit users. Systematic deviations from 4 a.m. would produce mismatches in this geographic correlation, which our results do not show. To directly respond to the concern, we will add a sensitivity analysis in the revised manuscript. This will test shifts in the assumed minimum (e.g., 3:00 a.m. to 5:00 a.m. in 30-minute steps) and report effects on the sub-30-minute accuracy and one-hour heuristic margin, broken down by community categories including sports, finance, and post-2015 temporal subsets. We will also examine non-Western communities using available subreddit data where sample sizes permit. This addition will quantify robustness and address potential offsets explicitly. revision: yes
Circularity Check
No significant circularity; inference validated against independent geographic benchmarks
full rationale
The paper's core pipeline aligns observed activity histograms to a 4 a.m. local minimum drawn from an external, well-established circadian literature finding. Accuracy, saturation at <1000 comments, and geographic-evolution maps are then measured by direct correlation against independent Reddit geographic-spread data and cross-community validation (sports, finance, etc.). No equations, fitted parameters, or self-citations are shown to reduce the reported error metrics or time-zone outputs to the input assumption by construction. The derivation remains externally falsifiable and does not collapse into self-definition or tautological renaming.
Axiom & Free-Parameter Ledger
free parameters (1)
- 4 a.m. activity minimum
axioms (1)
- domain assumption posting rhythms encode circadian structure
Reference graph
Works this paper leans on
-
[1]
Alarfaj, J
L. Alarfaj, J. Blackburn, M. Amjad, J. Patel, and Z. Ertem. 2025. Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation.Information16, 9 (2025), 748. doi:10.3390/ info16090748
2025
-
[2]
Michele Avalle, Niccolò Di Marco, Gabriele Etta, Emanuele Sangiorgio, Shayan Alipour, Anita Bonetti, Lorenzo Alvisi, Antonio Scala, Andrea Baronchelli, Matteo Cinelli, et al . 2024. Persistent interaction patterns across social media platforms and over time.Nature628, 8008 (2024), 582–589
2024
-
[3]
Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky
Christopher Bail, Lisa Argyle, Taylor Brown, John Bumpus, Haohan Chen, M.B. Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky. 2018. Exposure to opposing views can increase political polarization: evidence from a large-scale field experiment on social media.SocArXivXXX (2018), 1–6. arXiv:1408.1149 doi:10.17605/ osf.io/4ygux
-
[4]
Duilio Balsamo, Paolo Bajardi, and André Panisson. 2019. Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort.The World Wide Web Conference on - WWW ’19(2019), 2572–2579. arXiv:1904.00003 doi:10.1145/3308558.3313634
-
[5]
Berragan, A
C. Berragan, A. Singleton, A. Calafiore, and J. Morley. 2023. Evaluating the similarity of location-based corpora identified in Reddit comments. InProceedings of the First Workshop on Geographic Information Extraction from Texts (GeoExT 2023) co-located with ECIR 2023 (CEUR Workshop Proceedings, Vol. 3385). 1–6
2023
-
[6]
Lia Bozarth, Daniele Quercia, Licia Capra, and Sanja Šćepanović. 2023. The role of the big geographic sort in online news circulation among U.S. Reddit users.Scientific Reports13, 1 (2023), 6711. doi:10.1038/s41598-023-33247-3
-
[7]
Stevie Chancellor and Munmun De Choudhury. 2020. Methods in predictive techniques for mental health status on social media: a critical review.npj Digital Medicine3 (2020). doi:10.1038/s41746-020-0233-7
-
[8]
Bo-Chiuan Chen, Dong-Chul Seo, Hsien-Chang Lin, and David Crandall. 2018. Framework for estimating sleep timing from digital footprints.BMJ Innovations4, 4 (2018). doi:10.1136/bmjinnov-2018-000270
-
[9]
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Predicting postpartum changes in emotion and behavior via social media.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI ’13(2013), 3267. doi:10.1145/2470654.2466447
-
[10]
Jacob Eisenstein. 2018. Identifying Regional Dialects in On-Line Social Media.The Handbook of Dialectology2013 (2018), 368–383. doi:10.1002/9781118827628.ch21
-
[11]
Casey Fiesler and Nicholas Proferes. 2018. “Participant” Perceptions of Twitter Research Ethics.Social Media and Society4 (2018). doi:10.1177/2056305118763366
-
[12]
Kambiz Ghoorchian and Sarunas Girdzijauskas. 2018. Spatio-Temporal Multiple Geo-Location Identification on Twitter. In2018 IEEE International Conference on Big Data. 1895–1902. doi:10.1109/BigData.2018.8622131
- [13]
-
[14]
M. Hoffmann and A. Heft. 2020. “Here, There and Everywhere”: Classifying Location Information in Social Media Data — Possibilities and Limitations.Communication Methods and Measures14, 3 (2020), 184–203. doi:10.1080/19312458.2019. 1708282
-
[15]
S Rao Jammalamadaka and Y Ramakrishna Sarma. 1988. A correlation coefficient for angular variables.Statistical theory and data analysis II(1988), 349–364
1988
-
[16]
Sean Kates, Joshua Tucker, Jonathan Nagler, and Richard Bonneau. 2021. The Times They Are Rarely A-Changin’: Circadian Regularities in Social Media Use.Journal of Quantitative Description: Digital Media1 (2021). doi:10.51685/ jqd.2021.017
2021
-
[17]
Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, Simone Raponi, and Julinda Stefa. 2019. Nationality and geolocation-based profiling in the dark (web).IEEE Transactions on Services Computing15, 1 (2019), 429–441. , Vol. 1, No. 1, Article . Publication date: May 2026. 26 Della Negra et al
2019
-
[18]
Mahmud, J
J. Mahmud, J. Nichols, and C. Drews. 2012. Where Is This Tweet From? Inferring Home Locations of Twitter Users. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 6. 228–235
2012
- [19]
-
[20]
W. Meyerson, S. Fineberg, Y. Song, A. Faber, G. Ash, F. Andrade, P. Corlett, M. Gerstein, and R. Hoyle. 2023. Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys.JMIR Formative Research7 (2023), e38112. doi:10.2196/38112
-
[21]
Corrado Monti, Jacopo D’Ignazi, Michele Starnini, and Gianmarco De Francisci Morales. 2023. Evidence of Demographic rather than Ideological Segregation in News Discussion on Reddit.arXiv(2023). arXiv:2302.07598 doi:10.48550/arxiv. 2302.07598
work page internal anchor Pith review doi:10.48550/arxiv 2023
-
[22]
A. Morales, V. Vavilala, Rosa M. Benito, and Y. Bar-Yam. 2017. Global patterns of synchronization in human communi- cations.Journal of The Royal Society Interface14 (2017). doi:10.1098/rsif.2016.1048
- [23]
-
[24]
Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, and Fred Morstatter
Juergen Pfeffer, Daniel Matter, Kokil Jaidka, Onur Varol, Afra Jahanbakhsh Mashhadi, Jana Lasser, Dennis Assenmacher, Siqi Wu, Diyi Yang, Cornelia Brantner, Daniel M. Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth Joseph, David Garcia, and Fred Morstatter. 2023. Just Another Day on Twitter: A Complete 24 Hours of Twitter Data. In International Conf...
2023
-
[25]
Tiziano Piccardi, Martin Gerlach, and Robert West. 2024. Curious rhythms: Temporal regularities of wikipedia consumption. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 18. 1249–1261
2024
-
[26]
1993.The Virtual Community: Homesteading on the Electronic Frontier
Howard Rheingold. 1993.The Virtual Community: Homesteading on the Electronic Frontier. Addison-Wesley
1993
-
[27]
T. Scheffler and C. Kyba. 2016. Measuring Social Jetlag in Twitter Data. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 10. 675–678. doi:10.1609/icwsm.v10i1.14789
-
[28]
Mila Stillman and Anna M. Kruspe. 2024. Geolocation Extraction From Reddit Text Data. InGeoExT@ECIR
2024
-
[29]
stuck_in_the_matrix, Watchful1, and RaiderBDev. [n. d.]. Reddit comments/submissions 2005-06 to 2024-12. ([n. d.])
2005
-
[30]
Anna Tigunova, Paramita Mirza, Andrew Yates, and Weikum, and Gerhard. 2020. RedDust: a Large Reusable Dataset of Reddit User Traits. InProceedings of the Twelfth Language Resources and Evaluation Conference. https://aclanthology. org/2020.lrec-1.751/
2020
-
[31]
1995.Life on the Screen: Identity in the Age of the Internet
Sherry Turkle. 1995.Life on the Screen: Identity in the Age of the Internet. Simon & Schuster
1995
-
[32]
Isaac Waller and Ashton Anderson. 2021. Quantifying social organization and political polarization in online platforms. Nature600 (2021), 264–268. arXiv:2010.00590 doi:10.1038/s41586-021-04167-x
-
[33]
Ke Zhou, M. Constantinides, D. Quercia, and S. Şćepanović. 2023. How Circadian Rhythms Extracted from Social Media Relate to Physical Activity and Sleep.Proceedings of the International AAAI Conference on Web and Social Media 17, 1 (2023), 948–959. doi:10.1609/icwsm.v17i1.22202 A Population Location Deconvolution Assigning a subreddit’s entire user base e...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.