Principal component analysis (PCA) is a popular dimension reduction technique for vector data. Factored PCA (FPCA) is a probabilistic extension of PCA for matrix data, which can substantially reduce the number of parameters in PCA while yield satisfactory performance. However, FPCA is based on the Gaussian assumption and thereby susceptible to outliers. Although the multivariate $t$ distribution as a robust modeling tool for vector data has a very long history, its application to matrix data is very limited. The main reason is that the dimension of the vectorized matrix data is often very high and the higher the dimension, the lower the breakdown point that measures the robustness. To solve the robustness problem suffered by FPCA and make it applicable to matrix data, in this paper we propose a robust extension of FPCA (RFPCA), which is built upon a $t$-type distribution called matrix-variate $t$ distribution. Like the multivariate $t$ distribution, the matrix-variate $t$ distribution can adaptively down-weight outliers and yield robust estimates. We develop a fast EM-type algorithm for parameter estimation. Experiments on synthetic and real-world datasets reveal that RFPCA is compared favorably with several related methods and RFPCA is a simple but powerful tool for matrix-valued outlier detection.
翻译:限制的五氯苯甲醚(PCCA)是五氯苯甲醚在矩阵数据方面的概率延伸,可以大大减少五氯苯甲醚的参数数量,同时取得令人满意的性能;然而,《五氯苯甲醚》的依据是高斯假设,因此容易出现外星体。虽然作为矢量数据可靠模型工具的多变美元分布方式具有非常长的历史,但其对矩阵数据的应用非常有限。主要原因是,矢量化矩阵数据的范围往往非常高,其尺寸越高,测量稳健度的分解点越低。为了解决五氯苯甲醚遇到的稳健性问题,并使其适用于矩阵数据,我们在本文件中提议大力扩展FPCA(FPCA),这是以美元类型分布方式构建的,称为稳健的矢量数据模型-变差美元分布方式。与多变差美元分布一样,基质变差美元分布可以适应性下限值的外值,但得出稳健的估计值越高。我们开发快速的EM型类型算法,用以测量基准数据,比重的RPA-CA型号矩阵是用于精确度测算。