Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized data, i.e., not limited to 2-D image data and has a wide range of applications in speech, text, and biology domains. In the proposed framework, manifold clustering is done in the latent space guided by a clustering loss. To overcome the problem that the clustering-oriented loss may deteriorate the geometric structure of the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally. Extensive experimental results have shown that GCML exhibits superior performance to counterparts in terms of qualitative visualizations and quantitative metrics, which demonstrates the effectiveness of preserving geometric structure.
翻译:虽然基于多重的集群已成为一个受欢迎的研究专题,但我们认为,这些工程忽略了一个重要因素,即界定的集群损失可能腐蚀潜在空间的当地和全球结构;在本文件中,我们提议建立一个新的通用的集群和多功能学习框架,为通用数据保留几何结构,即不局限于2D图像数据,并在言论、文字和生物学领域应用范围广泛;在拟议框架中,以集群损失为指南,在潜在空间中进行多个集群。为了克服集群导向的损失可能恶化潜在空间的几何结构的问题,建议为维护本地的组合和多功能学习结构而进行等量损失。广泛的实验结果表明,在定性可视化和定量指标方面,全球集群与对应方相比表现优于定性可视化和定量指标,这表明了维护几何结构的有效性。