对主要构成部分分析采取半小组办法 (A semi-group approach to Principal Component Analysis)

Principal Component Analysis (PCA) is a well known procedure to reduce intrinsic complexity of a dataset, essentially through simplifying the covariance structure or the correlation structure. We introduce a novel algebraic, model-based point of view and provide in particular an extension of the PCA to distributions without second moments by formulating the PCA as a best low rank approximation problem. In contrast to hitherto existing approaches, the approximation is based on a kind of spectral representation, and not on the real space. Nonetheless, the prominent role of the eigenvectors is here reduced to define the approximating surface and its maximal dimension. In this perspective, our approach is close to the original idea of Pearson (1901) and hence to autoencoders. Since variable selection in linear regression can be seen as a special case of our extension, our approach gives some insight, why the various variable selection methods, such as forward selection and best subset selection, cannot be expected to coincide. The linear regression model itself and the PCA regression appear as limit cases.

翻译：主要组成部分分析(PCA)是一个众所周知的程序,主要通过简化共变结构或相关结构来降低数据集的内在复杂性。我们引入了一个新的代数、基于模型的视角,并特别规定将五氯苯甲醚扩展至没有第二次时间的分布,将五氯苯甲醚作为最佳低级近似问题。与迄今采用的方法相比,近似是基于一种光谱代表,而不是真实空间。然而,从源头的突出作用在此缩小,以界定相近表面及其最大维度。从这个角度看,我们的方法接近于Pearson(1901年)的原始概念,因此也接近于自动回归者。由于线性回归中的变量选择可被视为我们扩展的一个特殊案例,我们的方法提供了一些洞察力,为什么各种变量选择方法,例如远端选择和最佳子选择,不可能相互一致。线性回归模型本身和五氯苯甲醚回归作为有限案例出现。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

人工智能、机器学习的理论与实践

专知会员服务

73+阅读 · 2021年7月1日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

428+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日