Traditional principal component analysis (PCA) is well known in high-dimensional data analysis, but it requires to express data by a matrix with observations to be continuous. To overcome the limitations, a new method called flexible PCA (FPCA) for exponential family distributions is proposed. The goal is to ensure that it can be implemented to arbitrary shaped region for either count or continuous observations. The methodology of FPCA is developed under the framework of generalized linear models. It provides statistical models for FPCA not limited to matrix expressions of the data. A maximum likelihood approach is proposed to derive the decomposition when the number of principal components (PCs) is known. This naturally induces a penalized likelihood approach to determine the number of PCs when it is unknown. By modifying it for missing data problems, the proposed method is compared with previous PCA methods for missing data. The simulation study shows that the performance of FPCA is always better than its competitors. The application uses the proposed method to reduce the dimensionality of arbitrary shaped sub-regions of images and the global spread patterns of COVID-19 under normal and Poisson distributions, respectively.
翻译:传统主要成分分析(PCA)在高维数据分析中是众所周知的,但它要求用一个带有持续观测的矩阵来表达数据。为了克服这些局限性,提议了一种称为弹性五氯苯甲醚(PCCA)的新方法,用于指数式家庭分布;目标是确保它能够用于任意形成的区域,进行计数或连续观测。FPCA的方法是在通用线性模型的框架内开发的。它为FPCA提供了不局限于数据矩阵表达的统计模型。建议了一种最大可能性的方法,以便在了解主要成分(PCs)的数量时得出分解。这自然会引发一种在未知的情况下确定PCs数量的固定可能性方法。由于缺少数据问题,拟议的方法与以前五氯苯甲醚的缺失数据方法进行比较。模拟研究表明,FPCA的性能总是好于其竞争者。应用拟议的方法分别在正常分布和Poisson分布下减少任意形成的图像子区域的维度和COVID-19全球扩散模式。