A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.
翻译:维度减少的一个特别具有挑战性的背景是多变量环形数据,即在横形体上支持的数据。这类数据在分析生态和天文学以及分子结构中的各种现象时出现。本文介绍用非机器人数据进行维度减少的一种新颖方法,即缩放托鲁斯主元组成部分分析(ST-PCA),这是用非机器人数据进行维度减少的一种新办法。ST-PCA发现从横形到同一维度和某个半径的一个区域的数据驱动地图。该地图的构建是多层面的缩放,以尽量减少两个空间间对称大地距离之间的差异。ST-PCA随后采用主嵌套的子体序列,以获得最符合数据的嵌套式子体序列,这些子体随后可以倒回至横形。数字实验说明了如何利用ST-PCA实现低维度减少低维度数据,特别是为了集群分离的目的。在天文学(三维对立体)和分子生物学(七维对立体)中的两项数据应用,以SST-PCA的现有方法显示ST-PCA外表。