Graphs in many applications, such as social networks and IoT, are inherently streaming, involving continuous additions and deletions of vertices and edges at high rates. Constructing random walks in a graph, i.e., sequences of vertices selected with a specific probability distribution, is a prominent task in many of these graph applications as well as machine learning (ML) on graph-structured data. In a streaming scenario, random walks need to constantly keep up with the graph updates to avoid stale walks and thus, performance degradation in the downstream tasks. We present Wharf, a system that efficiently stores and updates random walks on streaming graphs. It avoids a potential size explosion by maintaining a compressed, high-throughput, and low-latency data structure. It achieves (i) the succinct representation by coupling compressed purely functional binary trees and pairing functions for storing the walks, and (ii) efficient walk updates by effectively pruning the walk search space. We evaluate Wharf, with real and synthetic graphs, in terms of throughput and latency when updating random walks. The results show the high superiority of Wharf over inverted index- and tree-based baselines.
翻译:许多应用程序中的图表,例如社交网络和IoT, 都具有内在的分流性, 包括连续增加和删除脊椎和边缘。 在图形中构造随机行走, 即以特定概率分布选择的脊椎序列, 在许多图形应用中是一项突出的任务, 以及图形结构数据中的机器学习( ML) 。 在流态情景中, 随机行走需要与图表更新不断同步, 以避免无色行走, 从而导致下游任务的性能退化。 我们展示了Wharf, 这个系统高效存储并更新流动图中的随机行走。 它通过保持压缩、 高通量和低纬度数据结构来避免潜在的规模爆炸。 它实现了 (一) 通过将压缩的纯功能二进式双树和配对功能合并以保存行走道, 以及 (二) 通过有效修整行走搜索空间来高效行走更新行走。 我们用真实和合成的图表来评估行走, 在随机行走时, 显示过量和粘性图表, 。 结果显示树的高度基线的高度。