We invent a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. Compared to recent work for the case where the sub-manifold is of dimension one \citep{Panaretos2014}--essentially a curve lying on the manifold attempting to capture one-dimensional variation--the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture higher dimensional variation in the data. We show the principal sub-manifold yields the ball spanned by the usual principal components in Euclidean space. By means of examples, we illustrate how to find, use and interpret a principal sub-manifold and we present an application in shape analysis.
翻译:我们发明了一种新颖的方法,在多变量数据集中找到位于高维空间内嵌入的非线性里伊曼尼方元体的主要组成部分。 我们的目标是扩展对五氯苯甲醚的几何解释,同时能够捕捉到数据中的非地球变化模式。 我们引入了主要子元子体的概念, 一种通过数据中心穿过的多元体, 以及从多元体向空间最大变异方向延伸的方块, 由当地切口空间中切口器的导体所覆盖。 与子元体为维度为一维\ citep{ Panaretos2014} 的最近工作相比, 我们用示例的方式说明如何找到、 使用和解释一个元件分析, 并用目前的方法来解释一个元件的形状。