Neighbor Embedding (NE) that aims to preserve pairwise similarities between data items has been shown to yield an effective principle for data visualization. However, even the currently best NE methods such as Stochastic Neighbor Embedding (SNE) may leave large-scale patterns such as clusters hidden despite of strong signals being present in the data. To address this, we propose a new cluster visualization method based on Neighbor Embedding. We first present a family of Neighbor Embedding methods which generalizes SNE by using non-normalized Kullback-Leibler divergence with a scale parameter. In this family, much better cluster visualizations often appear with a parameter value different from the one corresponding to SNE. We also develop an efficient software which employs asynchronous stochastic block coordinate descent to optimize the new family of objective functions. The experimental results demonstrate that our method consistently and substantially improves visualization of data clusters compared with the state-of-the-art NE approaches.
翻译:旨在维护数据项目之间对等相似之处的邻居嵌入式(NE)已经显示,它产生了数据可视化的有效原则。但是,即使是目前最好的NE方法,如Stochatic邻里嵌入式(SNE),也可能会留下大型模式,例如尽管数据中存在强烈信号,但群集仍然隐藏。为了解决这个问题,我们提议以邻里嵌入式(NE)为基础采用新的群集可视化方法。我们首先展示了邻里嵌入式(NE)方法的组合,它通过使用非正常的 Kullback- Leibeler 差异和比例参数将 SNE普遍化。在这个大家庭中,比SNE的参数值不同得多的群集可视化方法往往出现。我们还开发了一种高效的软件,使用非同步的随机块来协调下行来优化目标功能的新组合。实验结果表明,我们的方法与最先进的NE方法相比,持续和大幅度改进了数据集群的可视化。