Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semi-supervised embedding by UMAP. Then, we introduce the theory behind UMAP by algebraic topology and category theory. Then, we introduce UMAP as a neighbor embedding method and compare it with t-SNE and LargeVis algorithms. We discuss negative sampling and repulsive forces in UMAP's cost function. DensMAP is then explained for density-preserving embedding. We then introduce parametric UMAP for embedding by deep learning and progressive UMAP for streaming and out-of-sample data embedding.
翻译:统一 Manifold Applocation and project(UMAP) 是一种最新的维度减少和数据可视化方法(UMAP) 。 这是一份关于UMAP及其变体的辅导和调查文件。 我们从 UMAP 算法开始, 我们从此可以解释附近地区在输入和嵌入空间的概率, 优化成本功能, 培训算法, 梯度的衍生, 以及UMAP 的监督和半监督嵌入。 然后, 我们通过代数表和分类理论在 UMAP 背后引入理论。 然后, 我们引入 UMAP 作为邻系嵌入方法, 并将其与 t- SNE 和大型维值算法进行比较。 我们讨论 UMAP 的成本函数中的负抽样和反向力。 然后, DensMAP 被解释为密度保留嵌入。 然后我们引入参数 UMAP 用于深学习和进步 UMAP 嵌入流和外模数据嵌入。