In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a "topic" is a pattern of word use that we expect to evolve over the course of the collection. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. In contrast to the cDTM, the original discrete-time dynamic topic model (dDTM) requires that time be discretized. Moreover, the complexity of variational inference for the dDTM grows quickly as time granularity increases, a drawback which limits fine-grained discretization. We demonstrate the cDTM on two news corpora, reporting both predictive perplexity and the novel task of time stamp prediction.
翻译:在本文中,我们开发了连续时间动态主题模型(cDTM)。CDTM是一个动态主题模型,它使用布朗尼运动通过顺序收集文件来模拟潜在主题,在这个模型中,“主题”是一种我们预期在收集过程中会演变的词汇使用模式。我们从中获取一种高效的变异近似推理算法,它利用了文本观测的广度,这种属性使我们能够轻松处理许多时间点。与cDTM不同的是,原始的离散时间动态主题模型(dDTM)要求将时间分开。此外,随着时间粒子的增加,对 dDTM的变异推论的复杂性会迅速增长,这种推论会限制细微分解。我们在两个新闻体中展示了CDTM,我们既报告了预测性的曲解性,又报告了时间标记预测的新任务。