We provide a rigorous mathematical treatment to the crowding issue in data visualization when high dimensional data sets are projected down to low dimensions for visualization. By properly adjusting the capacity of high dimensional balls, our method makes right enough room to prepare for the embedding. A key component of the proposed method is an estimation of the correlation dimension at various scales which reflects the data density variation. The proposed adjustment to the capacity applies to any distance (Euclidean, geodesic, diffusion) and can potentially be used in many existing methods to mitigate the crowding during the dimension reduction. We demonstrate the effectiveness of the new method using synthetic and real datasets.
翻译:当高维数据集被预测到可视化的低维度时,我们为数据可视化中的挤积问题提供了严格的数学处理方法。通过适当调整高维球的能力,我们的方法为嵌入提供了适当的准备空间。拟议方法的一个关键组成部分是对反映数据密度变化的不同尺度的关联层面进行估计。对能力的拟议调整适用于任何距离(Euclidean、大地测量学、扩散),并有可能用于许多现有方法,以缓解在缩小维度过程中的挤积。我们用合成和真实数据集展示了新方法的有效性。