Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne
翻译:减少尺寸对于可视化和机器学习前处理高维数据都至关重要。我们引入了一种新颖的方法,它基于在原始空间的1个近距离相邻图上建立的等级结构,用于保存数据多层次分布的组合特性。建议的核心是,在性能和可视化质量方面,与t-SNE和UMAP的最新版本相比,没有优化的预测具有竞争力,同时在运行时速度要快一些。此外,其可解释的机理、预测新数据的能力以及可视化数据组的自然分离使它成为一般目的的一个不受监督的减少维度技术。在本文中,我们就拟议方法的健全性进行了辩论,并就从1K到11M样本和范围从28到16K不等的各种数据集进行了评估。我们在多度量度和目标维度上与其他州级方法进行比较,突出其效率和性能。代码可在https://github.com/koulakis/h-nne查阅。