UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing
翻译:UMAP是一种使用应用里曼语的几何学和代数表层学的非参数图形化的维度减少算法,用于查找结构化数据的低维嵌入层。UMAP算法由两步组成:(1) 计算数据集的图形代表(模糊的简单复杂),(2) 通过随机梯度梯度下降,优化图的低维嵌入层。这里,我们将UMAP的第二步扩展至对神经网络重量的参数优化,学习数据和嵌入之间的参数性关系。我们首先表明,参数UMAP在给人一个学习的参数映射(例如,为新数据快速在线嵌入)的好处时,可与其非参数对应方比较。我们然后探索UMAP,将其作为一种正规化,限制自动解析器的潜在分布,使全球结构保持的偏差相异,并通过捕捉无标签数据的结构提高半超度学习的分类精度。 Google Colab 步道: https://colab. regoglegle.com/driveW5-Wkn=MXXVp=zpt:http://