In ordinary Dimensionality Reduction (DR), each data instance in an m-dimensional space (original space) is mapped to one point in a d-dimensional space (visual space), preserving as much as possible distance and/or neighborhood relationships. Despite their popularity, even for simple datasets, the existing DR techniques unavoidably may produce misleading visual representations. The problem is not with the existing solutions but with problem formulation. For two dimensional visual space, if data instances are not co-planar or do not lie on a 2D manifold, there is no solution for the problem, and the possible approximations usually result in layouts with inaccuracies in the distance preservation and overlapped neighborhoods. In this paper, we elaborate on the concept of Multi-point Dimensionality Reduction where each data instance can be mapped to possibly more than one point in the visual space by providing the first general solution to it as a step toward mitigating this issue. By duplicating points, background information is added to the visual representation making local neighborhoods in the visual space more faithful to the original space. Our solution, named Red Gray Plus, is built upon and extends a combination of ordinary DR and graph drawing techniques. We show that not only Multi-point Dimensionality Reduction can be one of the potential directions to improve DR layouts' reliability but also that our initial solution to the problem outperforms popular ordinary DR methods quantitatively.
翻译:在普通的尺寸减少(DR)中,一个米维空间(原始空间)中的每个数据实例都被映射到一个点,在维维空间(视觉空间)中,尽可能保存距离和/或邻里关系。尽管多点减少尺寸减少的概念很受欢迎,即使是简单的数据集也是如此,但现有的DR技术可能不可避免地产生误导的视觉表现。问题不在于现有解决方案,而在于问题配置。对于两个维维视觉空间而言,如果数据实例不是共同平面或不位于2D方块上,问题就没有解决办法,而且可能的近似通常会导致在距离保护和重叠的邻里间出现不准确的布局。在本文中,我们详细阐述了多点减少尺寸减少的概念,因为每个数据实例都可以在视觉空间中绘制可能不止一个点的图示。通过复制点,背景信息被添加到视觉展示使视觉空间中的当地邻居更忠实于原始空间。我们称之为Red Gray Pl Pl的解决方案,而只是将普通的DRDR格式的最初方向和图表的组合推展了我们唯一的多维度方法。