Neural topic models have triggered a surge of interest in extracting topics from text automatically since they avoid the sophisticated derivations in conventional topic models. However, scarce neural topic models incorporate the word relatedness information captured in word embedding into the modeling process. To address this issue, we propose a novel topic modeling approach, called Variational Gaussian Topic Model (VaGTM). Based on the variational auto-encoder, the proposed VaGTM models each topic with a multivariate Gaussian in decoder to incorporate word relatedness. Furthermore, to address the limitation that pre-trained word embeddings of topic-associated words do not follow a multivariate Gaussian, Variational Gaussian Topic Model with Invertible neural Projections (VaGTM-IP) is extended from VaGTM. Three benchmark text corpora are used in experiments to verify the effectiveness of VaGTM and VaGTM-IP. The experimental results show that VaGTM and VaGTM-IP outperform several competitive baselines and obtain more coherent topics.
翻译:神经专题模型避免了传统专题模型的复杂衍生,因而自动引起了从文本中提取专题的兴趣。然而,稀缺的神经专题模型纳入了在嵌入模型进程中的文字中发现的与字有关的信息。为解决这一问题,我们提议了一个新的专题模型方法,称为变式高斯主题模型(VaGTM),根据变式自动编码器,拟议的VaGTM模型,每个专题都有多个变量的Gaussian在解码器中,以纳入与字有关的内容。此外,为了解决预先培训的与主题有关的词嵌入不遵循多变式高斯、变式高斯主题模型和不可变式神经预测(VaGTM-IP)的局限性,从VaGTM扩展了三个基准文本。实验结果显示,VaGTM和VaGTM-IP在几个竞争性基线上排出,并获得了更加一致的主题。