This paper is concerned by the analysis of observations organized in a matrix form whose elements are count data assumed to follow a Poisson or a multinomial distribution. We focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) that is assumed to have a low rank structure. We propose to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Our approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, our main contribution is to propose a data-driven way to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and we introduce novel methods to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. For data following a multinomial distribution, we also compare its performances to K-fold cross-validation. Examples from a survey study and metagenomics also illustrate the benefits of our approach for real data analysis.
翻译:本文对以矩阵形式组织的观测分析感到关切,其要素为计算数据,假设其成分为计算数据,以跟踪Poisson或多分子分布。我们侧重于估算强度矩阵(Poisson 案例)或构成矩阵(多式案例),假设其结构等级结构较低。我们建议建立一个估算器,以核规范处罚来尽量减少正常的负日志相似度。我们的方法很容易产生一个低级别矩阵值估测仪,带有肯定条目,这些条目属于多式案例的行内随机矩阵组。然后,我们的主要贡献是提出一种数据驱动方法,用以选择构建这种估计值时的正规化参数(Poisson 案例)或构成矩阵(多式案例)。我们提议在这种模型中尽量减少(约)对Kullback-Lebel (KL) 风险的不偏倚度估计,将最初为Gausian数据提议的无偏颇的风险评估概括性。我们的方法是一个微妙的问题,我们采用新的方法来获取这种不偏倚的估计数的准确的数值近值。然后,我们的主要贡献是提出一种数据模拟分析参数,用以验证在构建这样的分析参数中选择一种常规数据估算方法,我们从低级数据矩阵中进行数据分布的对比。