We study the task of selecting $k$ nodes, in a social network of size $n$, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability $p$. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees given knowledge of the entire network; however, obtaining full knowledge of the network is often very costly in practice. Here we develop algorithms and guarantees for approximating the optimal seed set while bounding how much network information is collected. First, we study the achievable guarantees using a sublinear influence sample size. We provide an almost tight approximation algorithm with an additive $\epsilon n$ loss and show that the squared dependence of sample size on $k$ is asymptotically optimal when $\epsilon$ is small. We then propose a probing algorithm that queries edges from the graph and use them to find a seed set with the same almost tight approximation guarantee. We also provide a matching (up to logarithmic factors) lower-bound on the required number of edges. This algorithm is implementable in field surveys or in crawling online networks. Our probing takes $p$ as an input which may not be known in advance, and we show how to down-sample the probed edges to match the best estimate of $p$ if they are collected with a higher probability. Finally, we test our algorithms on an empirical network to quantify the tradeoff between the cost of obtaining more refined network information and the benefit of the added information for guiding improved seeding strategies.
翻译:我们研究在规模为美元的社会网络中选择美元节点的任务,以便在独立的级联模型下,以级联概率为1美元,在最大预期的扩展规模下,以级联概率为1美元。以前关于该问题的大部分工作(称为影响力最大化)侧重于高效算法,以整个网络的知识所具备的可证实的保障来估计最佳种子组;然而,获得对网络的充分知识在实践中往往非常昂贵。在这里,我们开发了接近最佳种子组的算法和保障,同时将收集的网络信息量限制在多少。首先,我们利用一个子线性影响样本大小来研究可实现的保障。我们提供了几乎紧凑的近似接近算法算法,加上一个加添加的 $(eepsilon n$ ) 的加增量法,表明当$(epsilon) 值小时,对于最优的种子值的种子值的种子值的计算法,我们如何在网络中进行最精确的比值比值比值比值, 而在网络中进行最精确的比值比值比值比值比值的比值比值,我们所了解的比值, 将最终的算算出一个比值, 如何在网络里程中,我们最接近的比值是最终的比值为我们所知道的比值, 。