pith. sign in

arxiv: 1906.09995 · v2 · pith:P4Y27ACUnew · submitted 2019-06-24 · 💻 cs.DC

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data -- Accepted Version

Pith reviewed 2026-05-25 16:59 UTC · model grok-4.3

classification 💻 cs.DC
keywords mutual informationtime series correlationmulti-scale analysisbig dataadaptive streamingtemporal correlationsscalable correlation detection
0
0 comments X

The pith

AMIC identifies and ranks multi-scale temporal correlations in big time series using mutual information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AMIC as a technique to detect correlations between large time series datasets at multiple time scales. It uses mutual information to measure these relationships and orders the discoveries by their strength so users can attend to the most significant ones first. An adaptive streaming technique reduces repeated calculations, supporting scalability for high-volume data. A sympathetic reader would care because analyzing big data requires efficient ways to uncover hidden relationships across scales without exhaustive manual effort.

Core claim

AMIC is a method based on mutual information to identify correlations at multiple temporal scales in large time series. Discovered correlations are suggested to users in an order based on the strength of the relationships. The method supports an adaptive streaming technique that minimizes duplicated computation and is implemented for scalability. Comprehensive evaluation uses both synthetic and real-world data sets to assess effectiveness and scalability.

What carries the argument

The AMIC method, which applies mutual information across different temporal scales to compute and rank correlations by strength while using adaptive streaming to avoid redundant work.

If this is right

  • Correlations are ranked by strength to direct user attention to the strongest relationships first.
  • The adaptive streaming technique minimizes duplicated computation for efficiency.
  • The approach handles the volume and velocity of big data through its scalable implementation.
  • Evaluation demonstrates both effectiveness in finding correlations and scalability on large datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the ranking by mutual information strength aligns with domain expert judgment, it could reduce the time spent reviewing irrelevant correlations.
  • The multi-scale aspect allows detection of both short-term and long-term relationships in the same analysis.
  • Extensions might include applying similar adaptive techniques to other correlation measures beyond mutual information.

Load-bearing premise

Mutual information appropriately captures relevant temporal correlations at multiple scales and the adaptive streaming technique maintains accuracy while minimizing duplicated computation without missing key relationships.

What would settle it

A dataset of synthetic time series with planted known correlations at specific scales, run through AMIC to verify if the method recovers and correctly ranks them without omissions from the streaming adaptation.

Figures

Figures reproduced from arXiv: 1906.09995 by Huy Vo, Mai Vu, Nguyen Ho, Torben Bach Pedersen.

Figure 5
Figure 5. Figure 5: illustrates the influenced region and influenced marginal region concepts, and explains how they can help to minimize computational cost. Consider a data set of seven data points p0, ... ,p6 with their locations projected into boxed-array as in Fig. Sa. Let p0 (in red) be the reference point under monitoring, k = 2 be the nearest neighbor parameter, and the maximum norm1 be the distance metric between neig… view at source ↗
Figure 7
Figure 7. Figure 7: Mutual Information with different k 5.2 Parameters Setting - Value of k: We use k ranging from 1 to 20 to compute MI for the variables extracted from the real data sets. The MI values produced by different k are compared together. We found that k between 1 and 4 produces high variance of MI value, while the MI becomes more stable with k from 5 to 10. The value k = 6 gives the most stable result, thus, is s… view at source ↗
Figure 12
Figure 12. Figure 12: Taxi Fare vs. Rain Precipitation 14 [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 15
Figure 15. Figure 15: Taxi Trips vs. Collisions 15 and 311 complaints are not correlated overall, but a weak positive correlation is found in the extracted windows. This might suggest the presence of hidden variables in the pe￾riods where these two are weakly correlated. Additionally, our findings suggest that the number of complaints made by 311 calls has a daily periodic pattern, where the complaints are significantly higher… view at source ↗
Figure 17
Figure 17. Figure 17: Stress Test and Scalability Test on Spark cluster Summary In this section, we have performed an ex￾tensive evaluation on the performance of AMIC, verifying its capability in addressing Big Data challenges: variety, volume, velocity, and scalability. Specifically, the use of MI to measure correlations allows AMIC to uncover different types of relations and to work on any types of data, and thus to tackle t… view at source ↗
read the original abstract

Recent development in computing, sensing and crowd-sourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called Big Data to inform research and the decision-making process are virtually endless. In general, analyses have to be done across multiple data sets in order to bring out the most value of Big Data. A first important step is to identify temporal correlations between data sets. Given the characteristics of Big Data in terms of volume and velocity, techniques that identify correlations not only need to be fast and scalable, but also need to help users in ordering the correlations across temporal scales so that they can focus on important relationships. In this paper, we present AMIC (Adaptive Mutual Information-based Correlation), a method based on mutual information to identify correlations at multiple temporal scales in large time series. Discovered correlations are suggested to users in an order based on the strength of the relationships. Our method supports an adaptive streaming technique that minimizes duplicated computation and is implemented on top of Apache Spark for scalability. We also provide a comprehensive evaluation on the effectiveness and the scalability of AMIC using both synthetic and real-world data sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript introduces AMIC, an Adaptive Mutual Information-based Correlation method for identifying multi-scale temporal correlations in big time series data. It orders discovered correlations by relationship strength, uses an adaptive streaming technique to minimize duplicated computation, implements the approach on Apache Spark for scalability, and provides evaluation on synthetic and real-world data sets.

Significance. Should the method prove effective, it would offer a scalable, information-theoretic approach to prioritizing correlations across temporal scales in large datasets, which is relevant for big data analytics in distributed computing contexts.

minor comments (1)
  1. [Abstract] Abstract: the claim of a 'comprehensive evaluation' on effectiveness and scalability is stated without reference to specific metrics, baselines, or dataset characteristics that would allow assessment of the results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review of our manuscript on AMIC. The summary accurately captures the method's adaptive mutual information approach, ordering by relationship strength, streaming support, Spark implementation, and evaluation. The significance assessment aligns with our goals for scalable multi-scale correlation discovery in big time series data. The recommendation is listed as uncertain with no specific major comments provided in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces AMIC as a mutual-information-based method for multi-scale temporal correlations in time series, with an adaptive streaming layer on Spark and evaluation on synthetic plus real-world datasets. No load-bearing steps reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains. The central claims rest on external data evaluation rather than internal equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no details on specific parameters, axioms, or entities; any thresholds for scales or mutual information cutoffs would be free parameters but cannot be identified here.

pith-pipeline@v0.9.0 · 5742 in / 1189 out tokens · 34303 ms · 2026-05-25T16:59:36.645098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Agresti and B

    A. Agresti and B. Finlay, Statistical Methods for the Social Sciences. Pearson Education Limited, 2014

  2. [2]

    [Online]

    Nye open data. [Online]. Available: https:/ / opendata.cityofnewyork.us

  3. [3]

    Application of some correla­ tion coefficient techniques to time-series analysis,

    W. E. Dean Jr and R. Y. Anderson, "Application of some correla­ tion coefficient techniques to time-series analysis," Journal of the International Association for Mathematical Geology, vol. 6, no. 4, pp. 363-372, 1974

  4. [4]

    Application of pearson correlation coefficient (pee) and kolmogorov-smirnov distance (ksd) metrics to identify disease-specific biomarker genes,

    H.-C. Huang, S. Zheng, and Z. Zhao, "Application of pearson correlation coefficient (pee) and kolmogorov-smirnov distance (ksd) metrics to identify disease-specific biomarker genes," BMC Bioinformatics, vol. 11, no. 4, p. 1, 2010

  5. [5]

    Analysis of covariance with qualitative data,

    G. Chamberlain, "Analysis of covariance with qualitative data," 1979

  6. [6]

    Correlation analy­ sis of spatial time series datasets: A filter-and-refine approach,

    P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, "Correlation analy­ sis of spatial time series datasets: A filter-and-refine approach," in PAKDD Proc., 2003

  7. [7]

    Spatio-temporal correlation: theory and applications for wireless sensor networks,

    M. C. Vuran, 6. B. Akan, and I. F. Akyildiz, "Spatio-temporal correlation: theory and applications for wireless sensor networks," Computer Networks, vol. 45, no. 3, pp. 245-259, 2004

  8. [8]

    Spatio-temporal correlation-based fast coding unit depth decision for high efficiency video coding,

    C. Zhou, F. Zhou, and Y. Chen, "Spatio-temporal correlation-based fast coding unit depth decision for high efficiency video coding," Journal of Electronic Imaging, vol. 22, no. 4, pp. 043 001-043 001, 2013

  9. [9]

    Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns,

    E. W. Dereszynski and T. G. Dietterich, "Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns," ACM Transactions on Sensor Networks (TOSN), vol. 8, no. 1, p. 3, 2011

  10. [10]

    Towards sustainable solutions for applications in cloud computing and big data,

    T. T. N. HO, "Towards sustainable solutions for applications in cloud computing and big data," in Doctoral dissertation. Politec­ nico di Milano, Italy, 2017, http:/ /hdl.handle.net/10589/131740

  11. [11]

    A data-value-driven adaptation framework for energy efficiency for data intensive applications in clouds,

    T. T. N. Ho and B. Pernici, "A data-value-driven adaptation framework for energy efficiency for data intensive applications in clouds," in Technologies for Sustainability (SusTech), 2015 IEEE Conference on. IEEE, 2015, pp. 47-52

  12. [12]

    Finding related tables,

    A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu, "Finding related tables," in S/GMOD Proc., 2012, pp. 817-828

  13. [13]

    Fusing data with correlations,

    R. Pochampally, A. Das Sarma, X. L. Dong, A. Meliou, and D. Srivastava, "Fusing data with correlations," in S/GMOD Proc., 2014

  14. [14]

    Helping scientists reconnect their datasets,

    A. Alawini, D. Maier, K. Tufte, and B. Howe, "Helping scientists reconnect their datasets," in SSDBM Proc., 2014

  15. [15]

    A formal approach to finding explanations for database queries,

    S. Roy and D. Suciu, "A formal approach to finding explanations for database queries," in SIGMOD Proc., 2014

  16. [16]

    a-clusters: Capturing subspace correlation in a large data set,

    J. Yang, W. Wang, H. Wang, and P. Yu, "a-clusters: Capturing subspace correlation in a large data set," in Data Engineering Proc., 2002. Copyright ( c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This is the author's version of an article that has been publi...

  17. [17]

    A fast and effective method to find correlations among attributes in databases,

    E. P. de Sousa, C. Traina Jr, A. J. Traina, L. Wu, and C. Faloutsos, "A fast and effective method to find correlations among attributes in databases," Data Mining and Knowledge Discovery, vol. 14, no. 3, pp. 367-407, 2007

  18. [18]

    Efficient sen­ tinel mining using bitmaps on modern processors,

    M. Middelfart, T. B. Pedersen, and J. Krogsgaard, "Efficient sen­ tinel mining using bitmaps on modern processors," IEEE Transac­ tions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2231- 2244, 2013

  19. [19]

    Dat a polygamy: the many-many relationships among urban spatio­ temporal data sets,

    F. Chirigati, H. Dor aiswamy, T. Damoulas, and J. Freire, "Dat a polygamy: the many-many relationships among urban spatio­ temporal data sets," in SIGMOD Proc., 2016

  20. [20]

    Th e sliding wi ndow correlation procedure for detecting hidden corr elations: existence of behav­ ioral subgroups illustrated with aged rats,

    D. Schulz and J. P. Huston, "Th e sliding wi ndow correlation procedure for detecting hidden corr elations: existence of behav­ ioral subgroups illustrated with aged rats, " Journal of neuroscience methods, vol. 121, no. 2, pp. 129-137, 2002

  21. [21]

    Fast wi ndow correlations over uncooperative time series,

    R. Cole, D. Shasha, and X. Zhao, " Fast wi ndow correlations over uncooperative time series," in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. AC M, 2005, pp. 743-749

  22. [22]

    Estimating mutual information on data streams,

    F. Keller, E. Mi.iller, and K. Bol:un, " Estimating mutual information on data streams," in SSDBM Proc., 2015

  23. [23]

    Local correla­ tion detection wi th linearity enhancement in streaming data,

    Q. Xie, S. Shang, B. Yuan, C. Pang, and X. Zhang, " Local correla­ tion detection wi th linearity enhancement in streaming data," in Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2013, pp. 309-318

  24. [24]

    Ana ly sing real world data streams w ith spatio-temporal correlations: Entropy vs. pearson correlation,

    M. Bermudez-Edo, P. Barnaghi, and K. Moessner, "Ana ly sing real world data streams w ith spatio-temporal correlations: Entropy vs. pearson correlation," Automation in Construction, vol. 88, pp. 87- 100, 2018

  25. [25]

    F eature selection based on mutual infor mation criteria of max-dependency, max-relevance, and min-redundancy,

    H. Peng, F. Long, and C. Ding, "F eature selection based on mutual infor mation criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005

  26. [26]

    Normal­ ized mutual information feature selection,

    P. A. Estevez, M. Tesmer, C. A. Perez, and J. M. Zurada, "Normal­ ized mutual information feature selection," IEEE Trans. on Neural Networks, vol. 20, no. 2, pp. 189-201, 2009

  27. [27]

    Infor mation­ based clustering,

    N. Slonim, G. S. Atwal, G. Tkacik, and W. Bialek, "Infor mation­ based clustering," Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 51, pp. 18 297-18 302, 2005

  28. [28]

    An information-theoretic approach to quantitative association rule mining,

    Y. Ke, J. Cheng, and W. N g, "An information-theoretic approach to quantitative association rule mining," Knowledge and Information Systems, vol. 16, no. 2, pp. 213-244, 2008

  29. [29]

    Mutual­ infor mation-based registration of medical images: a survey,

    J. P. Pluim, J. A. Maintz, and M. A. Viergever, " Mutual­ infor mation-based registration of medical images: a survey," IEEE Trans. on Medical Imaging, vol. 22, no. 8, pp. 986-1004, 2003

  30. [30]

    Information­ theoretic inference of large transcriptional regulatory networks,

    P. E. Meyer, K. Kontos, F. Lafitt e, and G. Bontempi, " Information­ theoretic inference of large transcriptional regulatory networks," EURASIP journal on bioinformatics and systems biology, vol. 2007, no. 1, pp. 1-9, 2007

  31. [31]

    Aracne: an al­ goritl:un for the reconstruction of gene regulatory networks in a mammalian cellular context,

    A. A. Margolin, I. N emenman, K. Basso, C. Wiggins, G. Stolovitzky, R. D. Favera, and A. Califano, " Aracne: an al­ goritl:un for the reconstruction of gene regulatory networks in a mammalian cellular context," BMC bioinformatics, vol. 7, no. Suppl 1, p. S7, 2006

  32. [32]

    Us ing time-delayed mutual infor ­ mation to discover and interpret temporal correlation structure in complex populations,

    D. J. Alber s and G. Hr ipcsak, "Us ing time-delayed mutual infor ­ mation to discover and interpret temporal correlation structure in complex populations," Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 22, no. 1

  33. [33]

    Spa­ tiotemporal dynamics of the magnetosphere during geospace storms: Mutual information analysis,

    J. Chen, A. Sharma, J. Edwards, X. Shao, and Y. Kamide, "Spa­ tiotemporal dynamics of the magnetosphere during geospace storms: Mutual information analysis," Journal of Geophysical Re­ search: Space Physics, vol. 113, no. AS, 2008

  34. [34]

    Supporting correlation analysis on scientific datasets in parallel and distributed settings,

    Y. Su, G. Ag raw al, J. Woodring, A. Biswas, and H.-W. Shen, "Supporting correlation analysis on scientific datasets in parallel and distributed settings," in HPDC Proc., 2014

  35. [35]

    An adaptive information-theoretic approach for identifying tempor al correlations in big data sets,

    N. Ho, H. Vo, and M. Vu, "An adaptive information-theoretic approach for identifying tempor al correlations in big data sets," in Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 666-675

  36. [36]

    T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley&Sons,2012

  37. [37]

    Some data analyses using mutual information,

    D. R. Brillinger, "Some data analyses using mutual information," Brazilian Journal of Probability and Statistics, pp. 163-182, 2004

  38. [38]

    A comparative study of statistical methods used to identify dependencies between gene expression signals,

    S. de Siqueira Santos, D. Y. Takahashi, A. Nakata, and A. Fujita, "A comparative study of statistical methods used to identify dependencies between gene expression signals," Briefings in bioin­ formatics, vol. 15, no. 6, pp. 906-918, 2013. 18

  39. [39]

    Estimation of entropy and mutual infor mation,

    L. Paninski, "Estimation of entropy and mutual infor mation," Neural computation, vol. 15, no. 6, pp. 1191-1253, 2003

  40. [40]

    Estimating mutual information,

    A. Kraskov, H. Stogbauer, and P. Grassberger, "Estimating mutual information," Physical review E, vol. 69, no. 6, 2004

  41. [41]

    Ev aluation of mutual information estimators for time series,

    A. Papana and D. Kugiumtzis, "Ev aluation of mutual information estimators for time series," International Journal of Bifurcation and Chaos, vol. 19, no. 12, pp. 4197-4215, 2009

  42. [42]

    Mutual information estimation in higher dimensions: A speed-up of a k-nearest neigh­ bor based estimator,

    M. Vejmelka and K. Hlavackova-Schindler, "Mutual information estimation in higher dimensions: A speed-up of a k-nearest neigh­ bor based estimator," in ICANNGA Proc

  43. [43]

    Efficient neighbor searching in nonlinear time series analysis,

    T. Schreiber, "Efficient neighbor searching in nonlinear time series analysis," International Journal of Bifurcation and Chaos, vol. 05, no. 02, pp. 349-358, 1995

  44. [44]

    Probability distributions and maximum entr opy,

    K. Conrad, "Probability distributions and maximum entr opy," Entropy, vol. 6, no. 452, 2004

  45. [45]

    [Online]

    Center of urban science and progress, new york university. [Online]. Available: http:/ /cusp.nyu.edu

  46. [46]

    [Online]

    Center of data-intensive system. [Online]. Available: http: / /www.d aisy.aau.dk

  47. [47]

    A mutual information approach to calculating nonlin­ earity,

    R. Smith, "A mutual information approach to calculating nonlin­ earity," Stat, vol. 4, no. 1, pp. 291-303, 2015

  48. [48]

    Linear interpolation,

    M. Hazewinkel, "Linear interpolation," in Encyclopaedia of Mathe­ matics. Springer Science & Business Media, 1990

  49. [49]

    Velazquez, J

    S. Velazquez, J. A. Carta, and J. Matias, "Comparison between anns and linear mcp algorithms in the long-term estimation of the cost per kwh produced by a wind turbine at a candidate site: a case study in the canary islands," Applied energy, vol. 88, no. 11, pp. 3869-3881, 2011

  50. [50]

    Measuring and testing dependence by correlation of distances,

    G. J. Szekely, M. L. Rizzo, and N. K. Bakirov, "Measuring and testing dependence by correlation of distances," The annals of statistics, pp. 2769-2794, 2007. Nguyen Ho is a Postdoc Research Associate at the Center for Data-Intensive Systems (Daisy) at the Department of Computer Science, Aal­borg University, Denmark. Her research focuses on Big Data Analyt...