pith. sign in

arxiv: 1709.04290 · v1 · pith:2MRJG4F7new · submitted 2017-09-13 · 💻 cs.SI · cs.DB

Approximate Integration of streaming data

classification 💻 cs.SI cs.DB
keywords approximatestreamsedgescommunitycommunitiescorrelationdatadefine
0
0 comments X
read the original abstract

We approximate analytic queries on streaming data with a weighted reservoir sampling. For a stream of tuples of a Datawarehouse we show how to approximate some OLAP queries. For a stream of graph edges from a Social Network, we approximate the communities as the large connected components of the edges in the reservoir. We show that for a model of random graphs which follow a power law degree distribution, the community detection algorithm is a good approximation. Given two streams of graph edges from two Sources, we define the {\em Community Correlation} as the fraction of the nodes in communities in both streams. Although we do not store the edges of the streams, we can approximate the Community Correlation and define the {\em Integration of two streams}. We illustrate this approach with Twitter streams, associated with TV programs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.