广义泊松矩阵分解：针对过度离散计数数据的建模方法 (Generalized Poisson Matrix Factorization for Overdispersed Count Data)

Non-negative matrix factorization (NMF) is widely used as a feature extraction technique for matrices with non-negative entries, such as image data, purchase histories, and other types of count data. In NMF, a non-negative matrix is decomposed into the product of two non-negative matrices, and the approximation accuracy is evaluated by a loss function. If the Kullback-Leibler divergence is chosen as the loss function, the estimation coincides with maximum likelihood under the assumption that the data entries are distributed according to a Poisson distribution. To address overdispersion, negative binomial matrix factorization has recently been proposed as an extension of the Poisson-based model. However, the negative binomial distribution often generates an excessive number of zeros, which limits its expressive capacity. In this study, we propose a non-negative matrix factorization based on the generalized Poisson distribution, which can flexibly accommodate overdispersion, and we introduce a maximum likelihood approach for parameter estimation. This methodology provides a more versatile framework than existing models, thereby extending the applicability of NMF to a broader class of count data.

翻译：非负矩阵分解（NMF）作为一种特征提取技术，广泛应用于具有非负元素的矩阵数据，如图像数据、购买历史记录及其他类型的计数数据。在NMF中，一个非负矩阵被分解为两个非负矩阵的乘积，其近似精度通过损失函数进行评估。若选择Kullback-Leibler散度作为损失函数，在数据条目服从泊松分布的假设下，该估计与最大似然估计相一致。为处理过度离散问题，近期提出了负二项矩阵分解作为泊松模型的扩展。然而，负二项分布常会产生过多的零值，这限制了其表达能力。本研究提出一种基于广义泊松分布的非负矩阵分解方法，该方法能够灵活适应过度离散特性，并引入了参数估计的最大似然方法。该框架比现有模型具有更强的通用性，从而将NMF的适用范围扩展到更广泛的计数数据类型。