Dimensionality reduction and clustering are often used as preliminary steps for many complex machine learning tasks. The presence of noise and outliers can deteriorate the performance of such preprocessing and therefore impair the subsequent analysis tremendously. In manifold learning, several studies indicate solutions for removing background noise or noise close to the structure when the density is substantially higher than that exhibited by the noise. However, in many applications, including astronomical datasets, the density varies alongside manifolds that are buried in a noisy background. We propose a novel method to extract manifolds in the presence of noise based on the idea of Ant colony optimization. In contrast to the existing random walk solutions, our technique captures points that are locally aligned with major directions of the manifold. Moreover, we empirically show that the biologically inspired formulation of ant pheromone reinforces this behavior enabling it to recover multiple manifolds embedded in extremely noisy data clouds. The algorithm performance in comparison to state-of-the-art approaches for noise reduction in manifold detection and clustering is demonstrated, on several synthetic and real datasets, including an N-body simulation of a cosmological volume.
翻译:在许多复杂的机器学习任务中,通常使用尺寸减少和集群作为初步步骤,许多复杂的机器学习任务。噪音和外层的存在可能会使这种预处理的性能恶化,从而极大地妨碍随后的分析。在多方面的学习中,一些研究指出,当密度大大高于噪音所显示的密度时,清除与结构相近的背景噪音或噪音的解决方案。然而,在许多应用中,包括天文数据集,密度与埋藏在吵闹背景中的数个元件并存。我们提议了一种新颖的方法,在噪音存在时,在基于安特聚点优化理念的噪音中提取数个元件。与现有的随机步行解决方案相比,我们的技术捕捉到的点与多管主要方向相匹配。此外,我们从经验上表明,由生物学上激发的蚂蚁素配方加强了这种行为,使其能够回收极其热闹的数据云中所含的多个元件。在多种探测和集群中减少噪音的先进方法的算法性表现在数个合成和真实的数据集上,包括一个宇宙体体体体体积模拟。