Neighbor embedding methods $t$-SNE and UMAP are the de facto standard for visualizing high-dimensional datasets. Motivated from entirely different viewpoints, their loss functions appear to be unrelated. In practice, they yield strongly differing embeddings and can suggest conflicting interpretations of the same data. The fundamental reasons for this and, more generally, the exact relationship between $t$-SNE and UMAP have remained unclear. In this work, we uncover their conceptual connection via a new insight into contrastive learning methods. Noise-contrastive estimation can be used to optimize $t$-SNE, while UMAP relies on negative sampling, another contrastive method. We find the precise relationship between these two contrastive methods and provide a mathematical characterization of the distortion introduced by negative sampling. Visually, this distortion results in UMAP generating more compact embeddings with tighter clusters compared to $t$-SNE. We exploit this new conceptual connection to propose and implement a generalization of negative sampling, allowing us to interpolate between (and even extrapolate beyond) $t$-SNE and UMAP and their respective embeddings. Moving along this spectrum of embeddings leads to a trade-off between discrete / local and continuous / global structures, mitigating the risk of over-interpreting ostensible features of any single embedding. We provide a PyTorch implementation.
翻译:远邻嵌入方法 $t-SNE 和 UMAP 是视觉化高维数据集的事实上标准 。 从完全不同的观点出发, 它们的丢失功能似乎不相干。 实际上, 它们产生的嵌入差异很大, 并可能表明对同一数据的解释相互矛盾。 造成这种变化的基本原因, 以及更一般而言, 美元- SNE 和 UMAP 之间的确切关系仍然不清楚。 在这项工作中, 我们通过对对比式学习方法的新洞察发现它们的概念联系。 噪音- 连接性估计可用于优化美元- SNE, 而 UMAP则依赖于负面抽样, 另一种对比性方法。 我们发现这两种对比性方法之间的确切关系, 并提供了对负面抽样所引入的扭曲的数学特征。 从视觉上看, 这种扭曲性结果在 UMAPM 和 UMAP 之间产生比美元- SNE 更加紧密的集群, 以及它们各自的嵌入式结构之间产生更大的紧密的嵌入。 我们利用这种新的概念联系来提出并实施负面取样的概括, 使我们能够在(甚至超出美元- Sent-SNE- UMAPNE- AS- deliction- delistring the the the tracliver- trading the the the the exliver- tracal- trace- trading- trading the the the tranal- ex- trading- traction- traction- traction- traction- traction- g- tradings- trading- trace- trading- trace- trace- trace- tradings- traction- trather- tradings- tradings- thes- tradings</s>