The amount of collected information on data repositories has vastly increased with the advent of the internet. It has become increasingly complex to deal with these massive data streams due to their sheer volume and the throughput of incoming data. Many of these data streams are mapped into graphs, which helps discover some of their properties. However, due to the difficulty in processing massive streaming graphs, they are summarized such that their properties can be approximately evaluated using the summaries. gSketch, TCM, and gMatrix are some of the major streaming graph summarization techniques. Our primary contribution is devising kMatrix, which is much more memory efficient than existing streaming graph summarization techniques. We achieved this by partitioning the allocated memory using a sample of the original graph stream. Through the experiments, we show that kMatrix can achieve a significantly less error for the queries using the same space as that of TCM and gMatrix.
翻译:随着互联网的出现,在数据储存库中收集的信息量大大增加,处理这些庞大的数据流变得日益复杂,因为数据流数量庞大,而且输入的数据量也很高。许多数据流被映射成图表,有助于发现其中的一些属性。然而,由于处理大量流图的困难,对数据储存库中收集的信息量进行了总结,以便利用摘要对其属性进行大致评估。gSketch、TCM和gMatrix是主要流图图形汇总技术的一部分。我们的主要贡献是设计了 kMatrix, 其记忆效率远高于现有的流图图形汇总技术。我们通过使用原始图表流的样本对分配的内存进行分割,从而实现了这一点。我们通过实验表明, KMatrix在使用与 TCM 和 gMatrix 相同的空间进行查询时,其属性的误差要小得多。