Data visualization is the process by which data of any size or dimensionality is processed to produce an understandable set of data in a lower dimensionality, allowing it to be manipulated and understood more easily by people. The goal of our paper is to survey the performance of current high-dimensional data visualization techniques and quantify their strengths and weaknesses through relevant quantitative measures, including runtime, memory usage, clustering quality, separation quality, global structure preservation, and local structure preservation. To perform the analysis, we select a subset of state-of-the-art methods. Our work shows how the selected algorithms produce embeddings with unique qualities that lend themselves towards certain tasks, and how each of these algorithms are constrained by compute resources.
翻译:数据可视化是处理任何大小或维度数据的过程,以便产生一套可以理解的较低维度数据,使人们更容易操作和理解这些数据。我们的文件的目的是调查当前高维数据可视化技术的性能,并通过相关的量化措施,包括运行时间、记忆使用、集群质量、分离质量、全球结构保护和地方结构保护,量化其优缺点。为了进行分析,我们选择了一组最先进的方法。我们的工作表明,所选的算法如何产生具有独特品质的嵌入,适合某些任务,以及每种算法如何受到资源的计算制约。