We report on the potential for using algorithms for non-negative matrix factorization (NMF) to improve parameter estimation in topic models. While several papers have studied connections between NMF and topic models, none have suggested leveraging these connections to develop new algorithms for fitting topic models. Importantly, NMF avoids the "sum-to-one" constraints on the topic model parameters, resulting in an optimization problem with simpler structure and more efficient computations. Building on recent advances in optimization algorithms for NMF, we show that first solving the NMF problem then recovering the topic model fit can produce remarkably better fits, and in less time, than standard algorithms for topic models. While we focus primarily on maximum likelihood estimation, we show that this approach also has the potential to improve variational inference for topic models. Our methods are implemented in the R package fastTopics.
翻译:我们报告使用非负矩阵因子化算法来改进专题模型参数估计的可能性。虽然有几份论文研究了NMF和专题模型之间的联系,但没有人建议利用这些联系来开发适合专题模型的新算法。重要的是,NMF避免了专题模型参数的“总对一”限制,导致结构更简单、计算效率更高的优化问题。根据NMF优化算法的最近进展,我们表明,首先解决NMF问题,然后恢复专题模型的合适性能可以比专题模型的标准算法更合适,时间更短。虽然我们主要侧重于最大的可能性估计,但我们表明这一方法还有可能改善专题模型的变异推论。我们的方法在R包快速Topics中得到了实施。