Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In this paper, we show this factorization can be combined with regression on a continuous response variable. In practice, the method performs better than regression done after topics are identified and retrains interpretability.
翻译:非负矩阵因子化可用于以不受监督的方式自动检测物质内的专题。该技术相当于非负矩阵的近似值,作为两个低级别非负矩阵的产物。在本文件中,我们表明这一因子化可与持续响应变量的回归相结合。在实践中,该方法比在确定专题和可再trains可解释性后进行的回归效果要好。