We investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss. While similar approaches have been explored before, our method leverages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros and entry weights. Moreover, by adapting results from GLM theory, we provide support for these decompositions by (i) showing strong consistency under the GLM setup, (ii) checking the adequacy of a chosen exponential family via a generalized Hosmer-Lemeshow test, and (iii) determining the rank of the decomposition via a maximum eigenvalue gap method. To further support our findings, we conduct simulation studies to assess robustness to decomposition assumptions and extensive case studies using benchmark datasets from image face recognition, natural language processing, network analysis, and biomedical studies. Our theoretical and empirical results indicate that the proposed decomposition is more flexible, general, and robust, and can thus provide improved performance when compared to similar methods. To facilitate applications, an R package with efficient model fitting and family and rank determination is also provided.
翻译:我们调查了基于偏差的数据损失的一般矩阵系数,将无处不在的单值分解值扩大到平差误差损失之外。虽然以前曾探讨过类似的方法,但我们的方法利用了一般线性模型(GLMs)的古典统计方法,并提供了一种足够灵活的算法,以允许结构零和输入权加权;此外,我们调整了GLM理论的结果,为这些分解提供了支持,方法是:(一) 在GLM设置下显示高度一致;(二) 通过普遍Hosmer-Lemeshow测试,检查选定的指数式家庭是否足够,以及(三) 通过最大损耗值差距法确定分解的等级。为了进一步支持我们的调查结果,我们进行模拟研究,利用图像表面识别、自然语言处理、网络分析以及生物医学研究的基准数据集,评估分解假设和广泛案例研究的稳健性。我们的理论和实证结果表明,拟议的分解法更加灵活、一般和稳健,因此,与类似方法相比,能够提供更好的性业绩。为了便利应用,一个具有高效模型和等级定型的R包也提供了。