Information spread on networks can be efficiently modeled by considering three features: documents' content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as \textit{(time of publication, information's content, spreading entity)} as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.
翻译:在网络上传播的信息可以通过考虑以下三个特点来有效地建模:文件的内容、与其他出版物相对的出版时间以及传播者在网络中的位置。大多数以前的工作模式最多是其中两个,或依靠大量参数方法。根据最近的Drichlet-Point进程文献,我们采用了休斯顿(Hidden Online用户-Topic Network)模型,该模型在非参数的、不受监督的框架中共同考虑所有这些特征。该模型将动态的、以主题为基础的基础传播网络与上述主题一起连续设置。它不受监督;它把未标记的三重三重流视为\textit{(发布时间、信息内容、扩展实体)作为输入数据。在线推论是使用连续的蒙特-Carlo算法进行的,该算法以数据集的大小为直线度。我们的方法使现有的集恢复基线和子网络推论任务都得到了改进。