基于计分和计分策略的快速和误差加速影响最大化 (Fast and Error-Adaptive Influence Maximization based on Count-Distinct Sketches)

Influence maximization (IM) is the problem of finding a seed vertex set that maximizes the expected number of vertices influenced under a given diffusion model. Due to the NP-Hardness of finding an optimal seed set, approximation algorithms are frequently used for IM. In this work, we describe a fast, error-adaptive approach that leverages Count-Distinct sketches and hash-based fused sampling. To estimate the number of influenced vertices throughout a diffusion, we use per-vertex Flajolet-Martin sketches where each sketch corresponds to a sampled subgraph. To efficiently simulate the diffusions, the reach-set cardinalities of a single vertex are stored in memory in a consecutive fashion. This allows the proposed algorithm to estimate the number of influenced vertices in a single step for simulations at once. For a faster IM kernel, we rebuild the sketches in parallel only after observing estimation errors above a given threshold. Our experimental results show that the proposed algorithm yields high-quality seed sets while being up to 119x faster than a state-of-the-art approximation algorithm. In addition, it is up to 62x faster than a sketch-based approach while producing seed sets with 3%-12% better influence scores

翻译：影响最大化(IM) 是寻找种子顶点设置的问题, 使在特定扩散模型下受影响的顶点数量最大化。由于NP- 硬性地寻找最佳种子组, 经常对IM 使用近似算法。在这项工作中, 我们描述一种快速的、错误的适应性方法, 借助于计分分分解草图和散射基引信取样。为了估计扩散过程中受影响的顶点数量, 我们使用每张草图与抽样子图对应的每面顶点 Flajolet- Martin 草图。为了有效地模拟扩散, 单个顶点的顶点的达定基点以连续的方式存储在记忆中。这样, 拟议的算法可以一次用单步来估计受影响的顶点数。为了更快的 IM 内核, 我们仅在观察了超过给定临界点的估计误差之后, 才会同时重建这些草点。我们的实验结果显示, 提议的算法产生高品质种子组, 其速度比以119x的速度要快, 将一个更快速的种子组保存到以62 % 的种子测算。