缩放横形主元件分析 (Scaled torus principal component analysis)

A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.

翻译：维度减少的一个特别具有挑战性的背景是多变量环形数据,即在横形体上支持的数据。这类数据在分析生态和天文学以及分子结构中的各种现象时出现。本文介绍用非机器人数据进行维度减少的一种新颖方法,即缩放托鲁斯主元组成部分分析(ST-PCA),这是用非机器人数据进行维度减少的一种新办法。ST-PCA发现从横形到同一维度和某个半径的一个区域的数据驱动地图。该地图的构建是多层面的缩放,以尽量减少两个空间间对称大地距离之间的差异。ST-PCA随后采用主嵌套的子体序列,以获得最符合数据的嵌套式子体序列,这些子体随后可以倒回至横形。数字实验说明了如何利用ST-PCA实现低维度减少低维度数据,特别是为了集群分离的目的。在天文学(三维对立体)和分子生物学(七维对立体)中的两项数据应用,以SST-PCA的现有方法显示ST-PCA外表。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日