弹性家庭分配主要组成部分分析 (Flexible Principal Component Analysis for Exponential Family Distributions)

Traditional principal component analysis (PCA) is well known in high-dimensional data analysis, but it requires to express data by a matrix with observations to be continuous. To overcome the limitations, a new method called flexible PCA (FPCA) for exponential family distributions is proposed. The goal is to ensure that it can be implemented to arbitrary shaped region for either count or continuous observations. The methodology of FPCA is developed under the framework of generalized linear models. It provides statistical models for FPCA not limited to matrix expressions of the data. A maximum likelihood approach is proposed to derive the decomposition when the number of principal components (PCs) is known. This naturally induces a penalized likelihood approach to determine the number of PCs when it is unknown. By modifying it for missing data problems, the proposed method is compared with previous PCA methods for missing data. The simulation study shows that the performance of FPCA is always better than its competitors. The application uses the proposed method to reduce the dimensionality of arbitrary shaped sub-regions of images and the global spread patterns of COVID-19 under normal and Poisson distributions, respectively.

翻译：传统主要成分分析(PCA)在高维数据分析中是众所周知的,但它要求用一个带有持续观测的矩阵来表达数据。为了克服这些局限性,提议了一种称为弹性五氯苯甲醚(PCCA)的新方法,用于指数式家庭分布;目标是确保它能够用于任意形成的区域,进行计数或连续观测。FPCA的方法是在通用线性模型的框架内开发的。它为FPCA提供了不局限于数据矩阵表达的统计模型。建议了一种最大可能性的方法,以便在了解主要成分(PCs)的数量时得出分解。这自然会引发一种在未知的情况下确定PCs数量的固定可能性方法。由于缺少数据问题,拟议的方法与以前五氯苯甲醚的缺失数据方法进行比较。模拟研究表明,FPCA的性能总是好于其竞争者。应用拟议的方法分别在正常分布和Poisson分布下减少任意形成的图像子区域的维度和COVID-19全球扩散模式。

相关内容

PCA

关注 0

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日