Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular topic models, and also limit scalability. In this paper, we present several new results around DTMs. First, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs). This allows us to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection). Second, we show how to perform scalable approximate inference in these models based on ideas around stochastic variational inference and sparse Gaussian processes. This way we can train a rich family of DTMs to massive data. Our experiments on several large-scale datasets show that our generalized model allows us to find interesting patterns that were not accessible by previous approaches.
翻译:动态主题模型( DDMS) 模拟文献、 在线媒体和其他形式的文本中流行主题的演变。 DTMs 假设单词共发统计数据持续变化, 从而在模型参数上强制引入连续的随机过程。 这些动态前期的推论比常规主题模型要难得多, 也限制了可缩放性。 在本文中, 我们围绕 DTMs 展示了几个新的结果。 首先, 我们把Wiener 过程的可移植前科类扩展到 Gaussian 进程( GPs) 的通用类 。 这使我们能够探索那些随着时间的流逝、 有长期记忆或时间集中( 用于检测事件) 的话题。 其次, 我们展示了如何在这些模型中根据围绕随机变异和稀少高斯 进程的想法进行可缩放的近似推论。 这样我们就可以将丰富的DTM 系列 培训成大规模数据 。 我们在几个大型数据集( GPs) 实验显示, 我们的通用模型可以让我们找到先前无法获取的有趣模式 。