AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data -- Accepted Version
Pith reviewed 2026-05-25 16:59 UTC · model grok-4.3
The pith
AMIC identifies and ranks multi-scale temporal correlations in big time series using mutual information.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AMIC is a method based on mutual information to identify correlations at multiple temporal scales in large time series. Discovered correlations are suggested to users in an order based on the strength of the relationships. The method supports an adaptive streaming technique that minimizes duplicated computation and is implemented for scalability. Comprehensive evaluation uses both synthetic and real-world data sets to assess effectiveness and scalability.
What carries the argument
The AMIC method, which applies mutual information across different temporal scales to compute and rank correlations by strength while using adaptive streaming to avoid redundant work.
If this is right
- Correlations are ranked by strength to direct user attention to the strongest relationships first.
- The adaptive streaming technique minimizes duplicated computation for efficiency.
- The approach handles the volume and velocity of big data through its scalable implementation.
- Evaluation demonstrates both effectiveness in finding correlations and scalability on large datasets.
Where Pith is reading between the lines
- If the ranking by mutual information strength aligns with domain expert judgment, it could reduce the time spent reviewing irrelevant correlations.
- The multi-scale aspect allows detection of both short-term and long-term relationships in the same analysis.
- Extensions might include applying similar adaptive techniques to other correlation measures beyond mutual information.
Load-bearing premise
Mutual information appropriately captures relevant temporal correlations at multiple scales and the adaptive streaming technique maintains accuracy while minimizing duplicated computation without missing key relationships.
What would settle it
A dataset of synthetic time series with planted known correlations at specific scales, run through AMIC to verify if the method recovers and correctly ranks them without omissions from the streaming adaptation.
Figures
read the original abstract
Recent development in computing, sensing and crowd-sourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called Big Data to inform research and the decision-making process are virtually endless. In general, analyses have to be done across multiple data sets in order to bring out the most value of Big Data. A first important step is to identify temporal correlations between data sets. Given the characteristics of Big Data in terms of volume and velocity, techniques that identify correlations not only need to be fast and scalable, but also need to help users in ordering the correlations across temporal scales so that they can focus on important relationships. In this paper, we present AMIC (Adaptive Mutual Information-based Correlation), a method based on mutual information to identify correlations at multiple temporal scales in large time series. Discovered correlations are suggested to users in an order based on the strength of the relationships. Our method supports an adaptive streaming technique that minimizes duplicated computation and is implemented on top of Apache Spark for scalability. We also provide a comprehensive evaluation on the effectiveness and the scalability of AMIC using both synthetic and real-world data sets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AMIC, an Adaptive Mutual Information-based Correlation method for identifying multi-scale temporal correlations in big time series data. It orders discovered correlations by relationship strength, uses an adaptive streaming technique to minimize duplicated computation, implements the approach on Apache Spark for scalability, and provides evaluation on synthetic and real-world data sets.
Significance. Should the method prove effective, it would offer a scalable, information-theoretic approach to prioritizing correlations across temporal scales in large datasets, which is relevant for big data analytics in distributed computing contexts.
minor comments (1)
- [Abstract] Abstract: the claim of a 'comprehensive evaluation' on effectiveness and scalability is stated without reference to specific metrics, baselines, or dataset characteristics that would allow assessment of the results.
Simulated Author's Rebuttal
We thank the referee for their review of our manuscript on AMIC. The summary accurately captures the method's adaptive mutual information approach, ordering by relationship strength, streaming support, Spark implementation, and evaluation. The significance assessment aligns with our goals for scalable multi-scale correlation discovery in big time series data. The recommendation is listed as uncertain with no specific major comments provided in the report.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces AMIC as a mutual-information-based method for multi-scale temporal correlations in time series, with an adaptive streaming layer on Spark and evaluation on synthetic plus real-world datasets. No load-bearing steps reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains. The central claims rest on external data evaluation rather than internal equivalence to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Agresti and B. Finlay, Statistical Methods for the Social Sciences. Pearson Education Limited, 2014
work page 2014
- [2]
-
[3]
Application of some correla tion coefficient techniques to time-series analysis,
W. E. Dean Jr and R. Y. Anderson, "Application of some correla tion coefficient techniques to time-series analysis," Journal of the International Association for Mathematical Geology, vol. 6, no. 4, pp. 363-372, 1974
work page 1974
-
[4]
H.-C. Huang, S. Zheng, and Z. Zhao, "Application of pearson correlation coefficient (pee) and kolmogorov-smirnov distance (ksd) metrics to identify disease-specific biomarker genes," BMC Bioinformatics, vol. 11, no. 4, p. 1, 2010
work page 2010
-
[5]
Analysis of covariance with qualitative data,
G. Chamberlain, "Analysis of covariance with qualitative data," 1979
work page 1979
-
[6]
Correlation analy sis of spatial time series datasets: A filter-and-refine approach,
P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, "Correlation analy sis of spatial time series datasets: A filter-and-refine approach," in PAKDD Proc., 2003
work page 2003
-
[7]
Spatio-temporal correlation: theory and applications for wireless sensor networks,
M. C. Vuran, 6. B. Akan, and I. F. Akyildiz, "Spatio-temporal correlation: theory and applications for wireless sensor networks," Computer Networks, vol. 45, no. 3, pp. 245-259, 2004
work page 2004
-
[8]
Spatio-temporal correlation-based fast coding unit depth decision for high efficiency video coding,
C. Zhou, F. Zhou, and Y. Chen, "Spatio-temporal correlation-based fast coding unit depth decision for high efficiency video coding," Journal of Electronic Imaging, vol. 22, no. 4, pp. 043 001-043 001, 2013
work page 2013
-
[9]
Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns,
E. W. Dereszynski and T. G. Dietterich, "Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns," ACM Transactions on Sensor Networks (TOSN), vol. 8, no. 1, p. 3, 2011
work page 2011
-
[10]
Towards sustainable solutions for applications in cloud computing and big data,
T. T. N. HO, "Towards sustainable solutions for applications in cloud computing and big data," in Doctoral dissertation. Politec nico di Milano, Italy, 2017, http:/ /hdl.handle.net/10589/131740
work page 2017
-
[11]
T. T. N. Ho and B. Pernici, "A data-value-driven adaptation framework for energy efficiency for data intensive applications in clouds," in Technologies for Sustainability (SusTech), 2015 IEEE Conference on. IEEE, 2015, pp. 47-52
work page 2015
-
[12]
A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu, "Finding related tables," in S/GMOD Proc., 2012, pp. 817-828
work page 2012
-
[13]
Fusing data with correlations,
R. Pochampally, A. Das Sarma, X. L. Dong, A. Meliou, and D. Srivastava, "Fusing data with correlations," in S/GMOD Proc., 2014
work page 2014
-
[14]
Helping scientists reconnect their datasets,
A. Alawini, D. Maier, K. Tufte, and B. Howe, "Helping scientists reconnect their datasets," in SSDBM Proc., 2014
work page 2014
-
[15]
A formal approach to finding explanations for database queries,
S. Roy and D. Suciu, "A formal approach to finding explanations for database queries," in SIGMOD Proc., 2014
work page 2014
-
[16]
a-clusters: Capturing subspace correlation in a large data set,
J. Yang, W. Wang, H. Wang, and P. Yu, "a-clusters: Capturing subspace correlation in a large data set," in Data Engineering Proc., 2002. Copyright ( c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This is the author's version of an article that has been publi...
-
[17]
A fast and effective method to find correlations among attributes in databases,
E. P. de Sousa, C. Traina Jr, A. J. Traina, L. Wu, and C. Faloutsos, "A fast and effective method to find correlations among attributes in databases," Data Mining and Knowledge Discovery, vol. 14, no. 3, pp. 367-407, 2007
work page 2007
-
[18]
Efficient sen tinel mining using bitmaps on modern processors,
M. Middelfart, T. B. Pedersen, and J. Krogsgaard, "Efficient sen tinel mining using bitmaps on modern processors," IEEE Transac tions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2231- 2244, 2013
work page 2013
-
[19]
Dat a polygamy: the many-many relationships among urban spatio temporal data sets,
F. Chirigati, H. Dor aiswamy, T. Damoulas, and J. Freire, "Dat a polygamy: the many-many relationships among urban spatio temporal data sets," in SIGMOD Proc., 2016
work page 2016
-
[20]
D. Schulz and J. P. Huston, "Th e sliding wi ndow correlation procedure for detecting hidden corr elations: existence of behav ioral subgroups illustrated with aged rats, " Journal of neuroscience methods, vol. 121, no. 2, pp. 129-137, 2002
work page 2002
-
[21]
Fast wi ndow correlations over uncooperative time series,
R. Cole, D. Shasha, and X. Zhao, " Fast wi ndow correlations over uncooperative time series," in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. AC M, 2005, pp. 743-749
work page 2005
-
[22]
Estimating mutual information on data streams,
F. Keller, E. Mi.iller, and K. Bol:un, " Estimating mutual information on data streams," in SSDBM Proc., 2015
work page 2015
-
[23]
Local correla tion detection wi th linearity enhancement in streaming data,
Q. Xie, S. Shang, B. Yuan, C. Pang, and X. Zhang, " Local correla tion detection wi th linearity enhancement in streaming data," in Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2013, pp. 309-318
work page 2013
-
[24]
M. Bermudez-Edo, P. Barnaghi, and K. Moessner, "Ana ly sing real world data streams w ith spatio-temporal correlations: Entropy vs. pearson correlation," Automation in Construction, vol. 88, pp. 87- 100, 2018
work page 2018
-
[25]
H. Peng, F. Long, and C. Ding, "F eature selection based on mutual infor mation criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005
work page 2005
-
[26]
Normal ized mutual information feature selection,
P. A. Estevez, M. Tesmer, C. A. Perez, and J. M. Zurada, "Normal ized mutual information feature selection," IEEE Trans. on Neural Networks, vol. 20, no. 2, pp. 189-201, 2009
work page 2009
-
[27]
Infor mation based clustering,
N. Slonim, G. S. Atwal, G. Tkacik, and W. Bialek, "Infor mation based clustering," Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 51, pp. 18 297-18 302, 2005
work page 2005
-
[28]
An information-theoretic approach to quantitative association rule mining,
Y. Ke, J. Cheng, and W. N g, "An information-theoretic approach to quantitative association rule mining," Knowledge and Information Systems, vol. 16, no. 2, pp. 213-244, 2008
work page 2008
-
[29]
Mutual infor mation-based registration of medical images: a survey,
J. P. Pluim, J. A. Maintz, and M. A. Viergever, " Mutual infor mation-based registration of medical images: a survey," IEEE Trans. on Medical Imaging, vol. 22, no. 8, pp. 986-1004, 2003
work page 2003
-
[30]
Information theoretic inference of large transcriptional regulatory networks,
P. E. Meyer, K. Kontos, F. Lafitt e, and G. Bontempi, " Information theoretic inference of large transcriptional regulatory networks," EURASIP journal on bioinformatics and systems biology, vol. 2007, no. 1, pp. 1-9, 2007
work page 2007
-
[31]
A. A. Margolin, I. N emenman, K. Basso, C. Wiggins, G. Stolovitzky, R. D. Favera, and A. Califano, " Aracne: an al goritl:un for the reconstruction of gene regulatory networks in a mammalian cellular context," BMC bioinformatics, vol. 7, no. Suppl 1, p. S7, 2006
work page 2006
-
[32]
D. J. Alber s and G. Hr ipcsak, "Us ing time-delayed mutual infor mation to discover and interpret temporal correlation structure in complex populations," Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 22, no. 1
-
[33]
Spa tiotemporal dynamics of the magnetosphere during geospace storms: Mutual information analysis,
J. Chen, A. Sharma, J. Edwards, X. Shao, and Y. Kamide, "Spa tiotemporal dynamics of the magnetosphere during geospace storms: Mutual information analysis," Journal of Geophysical Re search: Space Physics, vol. 113, no. AS, 2008
work page 2008
-
[34]
Supporting correlation analysis on scientific datasets in parallel and distributed settings,
Y. Su, G. Ag raw al, J. Woodring, A. Biswas, and H.-W. Shen, "Supporting correlation analysis on scientific datasets in parallel and distributed settings," in HPDC Proc., 2014
work page 2014
-
[35]
An adaptive information-theoretic approach for identifying tempor al correlations in big data sets,
N. Ho, H. Vo, and M. Vu, "An adaptive information-theoretic approach for identifying tempor al correlations in big data sets," in Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 666-675
work page 2016
-
[36]
T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley&Sons,2012
work page 2012
-
[37]
Some data analyses using mutual information,
D. R. Brillinger, "Some data analyses using mutual information," Brazilian Journal of Probability and Statistics, pp. 163-182, 2004
work page 2004
-
[38]
S. de Siqueira Santos, D. Y. Takahashi, A. Nakata, and A. Fujita, "A comparative study of statistical methods used to identify dependencies between gene expression signals," Briefings in bioin formatics, vol. 15, no. 6, pp. 906-918, 2013. 18
work page 2013
-
[39]
Estimation of entropy and mutual infor mation,
L. Paninski, "Estimation of entropy and mutual infor mation," Neural computation, vol. 15, no. 6, pp. 1191-1253, 2003
work page 2003
-
[40]
Estimating mutual information,
A. Kraskov, H. Stogbauer, and P. Grassberger, "Estimating mutual information," Physical review E, vol. 69, no. 6, 2004
work page 2004
-
[41]
Ev aluation of mutual information estimators for time series,
A. Papana and D. Kugiumtzis, "Ev aluation of mutual information estimators for time series," International Journal of Bifurcation and Chaos, vol. 19, no. 12, pp. 4197-4215, 2009
work page 2009
-
[42]
M. Vejmelka and K. Hlavackova-Schindler, "Mutual information estimation in higher dimensions: A speed-up of a k-nearest neigh bor based estimator," in ICANNGA Proc
-
[43]
Efficient neighbor searching in nonlinear time series analysis,
T. Schreiber, "Efficient neighbor searching in nonlinear time series analysis," International Journal of Bifurcation and Chaos, vol. 05, no. 02, pp. 349-358, 1995
work page 1995
-
[44]
Probability distributions and maximum entr opy,
K. Conrad, "Probability distributions and maximum entr opy," Entropy, vol. 6, no. 452, 2004
work page 2004
- [45]
- [46]
-
[47]
A mutual information approach to calculating nonlin earity,
R. Smith, "A mutual information approach to calculating nonlin earity," Stat, vol. 4, no. 1, pp. 291-303, 2015
work page 2015
-
[48]
M. Hazewinkel, "Linear interpolation," in Encyclopaedia of Mathe matics. Springer Science & Business Media, 1990
work page 1990
-
[49]
S. Velazquez, J. A. Carta, and J. Matias, "Comparison between anns and linear mcp algorithms in the long-term estimation of the cost per kwh produced by a wind turbine at a candidate site: a case study in the canary islands," Applied energy, vol. 88, no. 11, pp. 3869-3881, 2011
work page 2011
-
[50]
Measuring and testing dependence by correlation of distances,
G. J. Szekely, M. L. Rizzo, and N. K. Bakirov, "Measuring and testing dependence by correlation of distances," The annals of statistics, pp. 2769-2794, 2007. Nguyen Ho is a Postdoc Research Associate at the Center for Data-Intensive Systems (Daisy) at the Department of Computer Science, Aalborg University, Denmark. Her research focuses on Big Data Analyt...
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.