Influence maximization aims to select k most-influential vertices or seeds in a network, where influence is defined by a given diffusion process. Although computing optimal seed set is NP-Hard, efficient approximation algorithms exist. However, even state-of-the-art parallel implementations are limited by a sampling step that incurs large memory footprints. This in turn limits the problem size reach and approximation quality. In this work, we study the memory footprint of the sampling process collecting reverse reachability information in the IMM (Influence Maximization via Martingales) algorithm over large real-world social networks. We present a memory-efficient optimization approach (called HBMax) based on Ripples, a state-of-the-art multi-threaded parallel influence maximization solution. Our approach, HBMax, uses a portion of the reverse reachable (RR) sets collected by the algorithm to learn the characteristics of the graph. Then, it compresses the intermediate reverse reachability information with Huffman coding or bitmap coding, and queries on the partially decoded data, or directly on the compressed data to preserve the memory savings obtained through compression. Considering a NUMA architecture, we scale up our solution on 64 CPU cores and reduce the memory footprint by up to 82.1% with average 6.3% speedup (encoding overhead is offset by performance gain from memory reduction) without loss of accuracy. For the largest tested graph Twitter7 (with 1.4 billion edges), HBMax achieves 5.9X compression ratio and 2.2X speedup.
翻译:影响最大化的目的是在网络中选择 k 最容易传播的脊椎或种子, 其影响由特定的传播过程界定。 虽然计算最佳种子组是 NP- Hard, 有效的近似算法存在。 但是, 即使是最先进的平行执行也受到一个抽样步骤的限制, 该步骤产生大量的记忆足迹。 这反过来限制了问题规模的伸缩和近似质量。 在这项工作中, 我们研究取样过程的内存足迹, 收集IMM( 通过 Martingales 影响通过 Martingales 实现最大化) 的反向可达性信息, 在大型真实世界社交网络中进行 。 尽管计算最优的种子组( 称为 HBmax ) 以 Riples为基础, 我们以Riples为主的存储效率优化方法, 或者直接用 IMFlightal- bload 的缩缩略图, 将我们的缩略图缩略图缩略图的缩略图缩略图缩略图缩略图缩为 82 。 然后, 它将中间的反向缩略图信息缩略图的缩略图与HX 缩略图拼略图的缩略图拼略图拼略图拼略图调数据, 通过缩缩缩缩缩缩缩缩缩缩成为缩缩缩缩缩缩成为8的缩缩为缩缩略图。