Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes. Recent studies followed a non-adaptive setting, where the seed nodes are selected before the start of the diffusion process and network parameters are updated when the diffusion stops. We consider an adaptive version of content-dependent online influence maximization problem where the seed nodes are sequentially activated based on real-time feedback. In this paper, we formulate the problem as an infinite-horizon discounted MDP under a linear diffusion process and present a model-based reinforcement learning solution. Our algorithm maintains a network model estimate and selects seed users adaptively, exploring the social network while improving the optimal policy optimistically. We establish $\widetilde O(\sqrt{T})$ regret bound for our algorithm. Empirical evaluations on synthetic network demonstrate the efficiency of our algorithm.
翻译:在线影响最大化的目的是通过选择几个种子节点,最大限度地扩大一个具有未知网络模式的社会网络中内容的影响分布。最近的研究是在非适应性环境下进行的,在扩散过程开始之前选择种子节点,在扩散停止时更新网络参数。我们考虑一个适应性版本的内容依赖在线影响最大化问题,在实时反馈的基础上,种子节点依次激活。在本文中,我们将问题发展成一个线性传播过程下的无限和折扣的MDP,并提出一个基于模型的强化学习解决方案。我们的算法维持一个网络模型估计,并选择种子用户适应性,探索社会网络,同时乐观地改进最佳政策。我们为我们的算法建立了$-全局O(sqrt{T}) 的遗憾。对合成网络的实证评估显示了我们的算法效率。