Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMAP have demonstrated impressive visualization performance on many real world datasets. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure: these methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure: it is difficult to design a better method without a true understanding of the choices we make in our algorithms and their empirical impact on the lower-dimensional embeddings they produce. Towards the goal of local structure preservation, we provide several useful design principles for DR loss functions based on our new understanding of the mechanisms behind successful DR methods. Towards the goal of global structure preservation, our analysis illuminates that the choice of which components to preserve is important. We leverage these insights to design a new algorithm for DR, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure. Our work provides several unexpected insights into what design choices both to make and avoid when constructing DR algorithms.
翻译:降低尺寸(DR)技术(t-SNE, UMAP, 和 TriMAP)在很多真实的世界数据集中展示了令人印象深刻的可视化表现。这些方法一直面临的一种紧张关系是,在保护全球结构与保护当地结构之间取舍:这些方法可以同时处理,但不能同时处理。在这项工作中,我们的主要目标是了解DR方法的哪些方面对保护当地和全球结构都很重要:如果不真正了解我们在算法中所作的选择及其对所生成的较低维度嵌入的实际影响,就很难设计出一种更好的方法。为了维护当地结构的目标,我们根据我们对成功的DR方法背后的机制的新理解,为DR损失功能提供了一些有用的设计原则。为了实现全球结构保护的目标,我们的分析表明,选择哪些组成部分对保护当地和全球结构很重要。我们利用这些洞察来设计一种新的DR算法,称为Pairwide Controle Manifol Approximation Projection(PACMAP),它既保护当地结构,又保护全球结构。我们的工作为设计设计如何设计DRDRA而避免选择和设计时如何进行,提供了一些意外理解。