In a world abundant with diverse data arising from complex acquisition techniques, there is a growing need for new data analysis methods. In this paper we focus on high-dimensional data that are organized into several hierarchical datasets. We assume that each dataset consists of complex samples, and every sample has a distinct irregular structure modeled by a graph. The main novelty in this work lies in the combination of two complementing powerful data-analytic approaches: topological data analysis (TDA) and geometric manifold learning. Geometry primarily contains local information, while topology inherently provides global descriptors. Based on this combination, we present a method for building an informative representation of hierarchical datasets. At the finer (sample) level, we devise a new metric between samples based on manifold learning that facilitates quantitative structural analysis. At the coarser (dataset) level, we employ TDA to extract qualitative structural information from the datasets. We showcase the applicability and advantages of our method on simulated data and on a corpus of hyper-spectral images. We show that an ensemble of hyper-spectral images exhibits a hierarchical structure that fits well the considered setting. In addition, we show that our new method gives rise to superior classification results compared to state-of-the-art methods.
翻译:在一个拥有来自复杂获取技术的丰富数据的世界中,日益需要新的数据分析方法。在本文中,我们侧重于由几个等级数据集组成的高维数据。我们假设每个数据集由复杂的样本组成,每个样本都有不同的非常规结构,以图表为模型。这项工作的主要新颖之处在于两种补充强大的数据分析分析方法的结合:地形数据分析(TDA)和几何多元学习。几何主要包含本地信息,而地形学本身就提供了全球描述仪。基于这一组合,我们提出了一个构建等级数据集信息化代表性的方法。在精细(样本)一级,我们根据多种学习,设计了一个样本之间的新指标,以便利定量结构分析。在剖析(数据集)一级,我们使用TDA从数据集一级提取定性结构信息。我们展示了我们的方法在模拟数据和超光谱图像的组合方面的适用性和优势。我们展示了一种超光谱图像展示了一种等级结构,这种等级结构在精细的层次结构上展示了我们所考虑的排序的方法。此外,我们展示了一种先进的方法。我们展示了一种先进的方法。