Conventional clustering methods based on pairwise affinity usually suffer from the concentration effect while processing huge dimensional features yet low sample sizes data, resulting in inaccuracy to encode the sample proximity and suboptimal performance in clustering. To address this issue, we propose a unified tensor clustering method (UTC) that characterizes sample proximity using multiple samples' affinity, thereby supplementing rich spatial sample distributions to boost clustering. Specifically, we find that the triadic tensor affinity can be constructed via the Khari-Rao product of two affinity matrices. Furthermore, our early work shows that the fourth-order tensor affinity is defined by the Kronecker product. Therefore, we utilize arithmetical products, Khatri-Rao and Kronecker products, to mathematically integrate different orders of affinity into a unified tensor clustering framework. Thus, the UTC jointly learns a joint low-dimensional embedding to combine various orders. Finally, a numerical scheme is designed to solve the problem. Experiments on synthetic datasets and real-world datasets demonstrate that 1) the usage of high-order tensor affinity could provide a supplementary characterization of sample proximity to the popular affinity matrix; 2) the proposed method of UTC is affirmed to enhance clustering by exploiting different order affinities when processing high-dimensional data.
翻译:以对称亲近性为基础的常规集群方法通常在加工巨大维特特征的同时会受到集中效应的影响,但样本大小较低,从而导致在集群中对样本接近性和亚优性进行编码不准确。为了解决这一问题,我们建议采用统一的高压集群方法(UTC),利用多个样本的亲近性将样本接近性从数学上整合成一个统一的高压集群框架,从而补充丰富的空间样本分布,从而推动集群。具体地说,我们发现三维差亲近性可以通过两个亲近性矩阵的Khari-Rao产品构建出问题。此外,我们早期的工作表明,Kronecker产品定义了第四级高压紧密性。因此,我们利用算术产品、Khatri-Rao和Kronecker产品,从数学上将不同的亲近性排序纳入一个统一的高压集群框架。因此,UTC联合学习一种联合的低维嵌入组合。最后,我们设计了一个数字方法来解决问题。关于合成数据集和真实世界数据集的实验表明,1)使用高阶的高度亲近性紧密性高亲近度产品。因此,可以补充采用一种混合方法,从而改进高通制。