Dropout is a widely used regularization trick to resolve the overfitting issue in large feedforward neural networks trained on a small dataset, which performs poorly on the held-out test subset. Although the effectiveness of this regularization trick has been extensively studied for convolutional neural networks, there is a lack of analysis of it for unsupervised models and in particular, VAE-based neural topic models. In this paper, we have analyzed the consequences of dropout in the encoder as well as in the decoder of the VAE architecture in three widely used neural topic models, namely, contextualized topic model (CTM), ProdLDA, and embedded topic model (ETM) using four publicly available datasets. We characterize the dropout effect on these models in terms of the quality and predictive performance of the generated topics.
翻译:神经主题模型真的需要使用dropout吗?对dropout在主题建模中的效果进行的分析
翻译摘要:
dropout是一种广泛使用的正则化技巧,用于解决大型前馈神经网络在小数据集上训练时过拟合的问题,并在保持测试子集的性能方面表现不佳。尽管dropout技巧的有效性已经在卷积神经网络中得到了广泛研究,但缺乏对无监督模型(特别是基于VAE的神经主题模型)的分析。在本文中,我们使用四个公共数据集,对VAE架构中的编码器和解码器中的dropout在三种广泛使用的神经主题模型(即上下文主题模型(CTM),ProdLDA和嵌入式主题模型(ETM))中的影响进行了分析。我们通过所生成的主题的质量和预测性能来描述这些模型的dropout效果。