Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important to evaluate their relative performance for a given dataset, and to leverage and combine their individual strengths. In this paper, we propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point. Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.} Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple simulated and real-world datasets from diverse applications to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization. Furthermore, we establish rigorous theoretical justification of our method based on a general statistical framework, yielding fundamental principles behind the empirical success of consensus visualization along with practical guidance.
翻译:降低尺寸和数据可视化的目的是将一个高维数据集投射到一个低维空间,同时捕捉数据中的内在结构。这是现代数据科学的一个不可或缺的部分,并且已经开发了许多维度递减和可视化算法。然而,不同的算法有其自身的优点和弱点,使得评估其相对于某一数据集的相对性能至关重要,并发挥和整合其个人优点。在本文件中,我们提出了一个高效的光谱方法,用于评估和结合由多种算法产生的某一数据集的多重可视化。拟议方法为每个数据点上保护结构的可视化相对性能提供了一个量化的计量 -- -- 可视化微粒子。然后,它利用微子来获得协商一致的可视化,这在捕捉基本真实数据结构的单个可视化方面大大改进了{质量。}我们的方法是灵活的,并围绕任何可视化进行包装。我们从多种应用中分析多个模拟和真实世界数据集,以展示从多种应用中显示对可视化进行视觉化的相对性表现的有效性,我们根据一个严格的可视化的理论性理论性解释性框架,我们根据严格的可视化原则,建立了一个可靠的可视化的理论性解释性解释性框架。