Principal component analysis (PCA) is a workhorse of modern data science. Practitioners typically perform PCA assuming the data conforms to Euclidean geometry. However, for specific data types, such as hierarchical data, other geometrical spaces may be more appropriate. We study PCA in space forms; that is, those with constant positive (spherical) and negative (hyperbolic) curvatures, in addition to zero-curvature (Euclidean) spaces. At any point on a Riemannian manifold, one can define a Riemannian affine subspace based on a set of tangent vectors and use invertible maps to project tangent vectors to the manifold and vice versa. Finding a low-dimensional Riemannian affine subspace for a set of points in a space form amounts to dimensionality reduction because, as we show, any such affine subspace is isometric to a space form of the same dimension and curvature. To find principal components, we seek a (Riemannian) affine subspace that best represents a set of manifold-valued data points with the minimum average cost of projecting data points onto the affine subspace. We propose specific cost functions that bring about two major benefits: (1) the affine subspace can be estimated by solving an eigenequation -- similar to that of Euclidean PCA, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. Specifically for hyperbolic PCA, the associated eigenequation operates in the Lorentzian space, endowed with an indefinite inner product; we thus establish a connection between Lorentzian and Euclidean eigenequations. We evaluate the proposed space form PCA on data sets simulated in spherical and hyperbolic spaces and show that it outperforms alternative methods in convergence speed or accuracy, often both.
翻译:首席元件分析( PCA) 是现代数据科学的工序 。 执业者通常使用 CPA, 假设数据符合 Euclide 的几何。 但是, 对于特定的数据类型, 如等级数据, 其他几何空间可能更为合适 。 我们用空间形式研究 CPA ; 也就是说, 具有恒定正( 球) 和负( 高偏差) 曲度的( 高偏差) 曲度。 在Riemann 的方程式中, 执业者通常会使用 Riemannian 偏心速度子空间; 使用不可逆的地图, 将正向矢量矢量矢量矢量的矢量矢量矢量矢量矢量矢量矢量的量矢量矢量矢量矢量递增。 任何这样的直线子空间都能够提供同一度和正度的空间形式和曲度。 要找到一个( Riemann ) 的次空间亚空间, 以正弦值直径的亚空间亚空间,, 以最优的直径直径的轨道保证, 将一个比值的极点显示一个最低的直线数据值的直线值数据, 。