In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the stationarity of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
翻译:在本文中,我们开发了对霍克斯进程内核功能的有效、非参数的巴伊西亚估计。非参数的巴伊西亚方法很重要,因为它提供了灵活的霍克斯内核,并且量化了它们的不确定性。我们的方法基于霍克斯过程的集群代表。利用霍克斯过程的固定性,我们有效地抽样随机分支结构,因此,我们把霍克斯过程分成了普瓦森过程的集群。我们得出了两种算法 -- -- 一个块Gibbs取样器,一个基于预期最大化的后天估计器 -- -- 我们显示我们的方法在理论上和经验上都具有线性的时间复杂性。在合成数据上,我们展示了我们能够推断弹性的霍克斯内核触发内核的方法。在两个大型的推特传播数据集中,我们显示我们的方法超越了目前最佳的状态,而时间复杂性在数据集的大小上是线性。我们还观察到,在网上视频的传播方面,学习到的内核内核反映了对不同类型内容的感长度,例如音乐。