Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using $k$NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE using the exaggeration parameter yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures and yields higher $k$NN recall. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lie Laplacian Eigenmaps. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto the attraction-repulsion spectrum, and highlight the inherent trade-offs between them.
翻译:邻里嵌入器是使用 $k$NN 图形来直观复杂高维数据集的一组方法。 要找到低维嵌入器, 这些算法将相邻点对齐之间具有吸引力的力量与所有点之间的令人厌恶的力量结合起来。 这种算法最受欢迎的例子之一是 t- SNE 。 我们在这里实验性地表明, 使用 exgroup 参数来改变 t- SNE 中吸引力和令人厌恶的力量之间的平衡, 产生一系列嵌入器, 其特征是简单的交换: 更强大的吸引力可以更好地代表连续的多元结构, 而更强的反射力可以更好地代表离散的集群结构, 并产生更高的美元。 我们发现 UMAP 嵌入器与 t- SNE 相匹配的力量, 其吸引力更大; 数学分析显示, 这是因为 UMAP 使用的负面抽样优化策略大大降低了有效的反向。 类似, ForgAtAtlas2, 通常用来直观发展单细胞定式数据, 产生与T-SNEEENE相对应的嵌嵌入器, 和吸引性更明显地显示我们最深层的深层的深层的光谱系的定位。