Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.
翻译:具有非三维大规模地形学的数据集可能很难嵌入具有现有维度减少算法的低维欧几里德空间。 我们提议用矢量捆绑来模拟地形复杂数据集, 使基空间能够算作大规模地形学, 而纤维则能算作本地几何学。 这样可以降低纤维的维度, 同时保存大尺度的地形学。 我们正式确定了这一观点, 作为一种应用, 我们描述一种算法, 将数据集和在欧几里德空间的初步表示作为输入, 假设该数据集将回收其大规模地形学的一部分, 并产生一种新的表示法, 通过局部线性消减法和初步的全球表示法, 将本地的表示法整合起来。 我们用动态系统和化学的示例来演示这种算法。 在这些例子中, 我们的算法能够学习比各种众所周知的基于计量的维度减少算法低目标层面的数据在表面上忠实地嵌入数据。