Non-negative matrix factorization (NMF) is widely used as a feature extraction technique for matrices with non-negative entries, such as image data, purchase histories, and other types of count data. In NMF, a non-negative matrix is decomposed into the product of two non-negative matrices, and the approximation accuracy is evaluated by a loss function. If the Kullback-Leibler divergence is chosen as the loss function, the estimation coincides with maximum likelihood under the assumption that the data entries are distributed according to a Poisson distribution. To address overdispersion, negative binomial matrix factorization has recently been proposed as an extension of the Poisson-based model. However, the negative binomial distribution often generates an excessive number of zeros, which limits its expressive capacity. In this study, we propose a non-negative matrix factorization based on the generalized Poisson distribution, which can flexibly accommodate overdispersion, and we introduce a maximum likelihood approach for parameter estimation. This methodology provides a more versatile framework than existing models, thereby extending the applicability of NMF to a broader class of count data.
翻译:非负矩阵分解(NMF)作为一种特征提取技术,广泛应用于具有非负元素的矩阵数据,如图像数据、购买历史记录及其他类型的计数数据。在NMF中,一个非负矩阵被分解为两个非负矩阵的乘积,其近似精度通过损失函数进行评估。若选择Kullback-Leibler散度作为损失函数,在数据条目服从泊松分布的假设下,该估计与最大似然估计相一致。为处理过度离散问题,近期提出了负二项矩阵分解作为泊松模型的扩展。然而,负二项分布常会产生过多的零值,这限制了其表达能力。本研究提出一种基于广义泊松分布的非负矩阵分解方法,该方法能够灵活适应过度离散特性,并引入了参数估计的最大似然方法。该框架比现有模型具有更强的通用性,从而将NMF的适用范围扩展到更广泛的计数数据类型。