Topic models and all their variants analyse text by learning meaningful representations through word co-occurrences. As pointed out by Williamson et al. (2010), such models implicitly assume that the probability of a topic to be active and its proportion within each document are positively correlated. This correlation can be strongly detrimental in the case of documents created over time, simply because recent documents are likely better described by new and hence rare topics. In this work we leverage recent advances in neural variational inference and present an alternative neural approach to the dynamic Focused Topic Model. Indeed, we develop a neural model for topic evolution which exploits sequences of Bernoulli random variables in order to track the appearances of topics, thereby decoupling their activities from their proportions. We evaluate our model on three different datasets (the UN general debates, the collection of NeurIPS papers, and the ACL Anthology dataset) and show that it (i) outperforms state-of-the-art topic models in generalization tasks and (ii) performs comparably to them on prediction tasks, while employing roughly the same number of parameters, and converging about two times faster. Source code to reproduce our experiments is available online.
翻译:威廉森等人(2010年)指出,这些模型隐含地假定,一个专题活跃的可能性及其在每份文件中的比例是正相关关系。这种关联对于随着时间推移产生的文件而言可能大为不利,只是因为最近的文件可能由新的、因此也是稀有的题目来更好地描述。在这项工作中,我们利用神经变异推断的最新进展,对动态焦点主题模型提出另一种神经学方法。事实上,我们开发了一个专题演进的神经模型,利用Bernoulli随机变量的序列来跟踪专题的外貌,从而将其活动与比例脱钩。我们评估了我们关于三种不同数据集的模式(联合国一般性辩论、NeurIPS文件的收集以及ACL Anthologic数据集),并显示(一)在一般化任务中超越了最新专题模型,并且(二)在预测任务上与这些模型相对可比较,同时使用大约相同数量的参数,同时将我们的代码同步到两个时间。