We propose an algorithm for visualizing a dataset by embedding it in 3-dimensional Euclidean space based on various given distances between the same pairs of datapoints. Its aim is to find an Embedding which preserves Neighborhoods Simultaneously for all given distances by generalizing the t-Stochastic Neighborhood Embedding approach (ENS-t-SNE). We illustrate the utility of ENS-t-SNE by demonstrating its use in three applications. First, to visualize different notions of clusters and groups within the same high-dimensional dataset with one 3-dimensional embedding, as opposed to providing different embeddings of the same data and trying to match the corresponding points. Second, to illustrate the effects of different hyper-parameters of the classical t-SNE. Third, by considering multiple different notions of clustering in data, ENS-t-SNE can generate an alternative embedding than the classic t-SNE. We provide an extensive quantitative evaluation with real-world and synthetic datasets of different sizes and using different numbers of projections.
翻译:我们建议一种算法,根据同一对数据点之间的不同距离,将数据集嵌入三维欧几里德空间,以可视化的方式将数据集嵌入三维欧几里德空间,目的是找到一个嵌入器,通过对 t-Stochacistic 邻里嵌入方法(ENS-t-SNE)加以概括,使所有数据都能够同时保存到不同的距离。我们通过在三个应用中演示了ENS-t-SNE的用途,来说明ENS-SNE的效用。首先,将同一高维数据集中的不同组群和组群的概念与一个三维嵌入不同,而不是提供同一数据的不同嵌入点,并试图与相应的点相匹配。第二,通过考虑数据中多种不同的组合概念,ENS-t-SNE可以产生不同于经典t-SNE的替代嵌入。我们用不同大小并使用不同数量预测的实时和合成数据集提供广泛的定量评估。