The VAT method is a visual technique for determining the potential cluster structure and the possible number of clusters in numerical data. Its improved version, iVAT, uses a path-based distance transform to improve the effectiveness of VAT for "tough" cases. Both VAT and iVAT have also been used in conjunction with a single-linkage(SL) hierarchical clustering algorithm. However, they are sensitive to noise and bridge points between clusters in the dataset, and consequently, the corresponding VAT/iVAT images are often in-conclusive for such cases. In this paper, we propose a constraint-based version of iVAT, which we call ConiVAT, that makes use of background knowledge in the form of constraints, to improve VAT/iVAT for challenging and complex datasets. ConiVAT uses the input constraints to learn the underlying similarity metric and builds a minimum transitive dissimilarity matrix, before applying VAT to it. We demonstrate ConiVAT approach to visual assessment and single linkage clustering on nine datasets to show that, it improves the quality of iVAT images for complex datasets, and it also overcomes the limitation of SL clustering with VAT/iVAT due to "noisy" bridges between clusters. Extensive experiment results on nine datasets suggest that ConiVAT outperforms the other three semi-supervised clustering algorithms in terms of improved clustering accuracy.
翻译:增值税是确定潜在集群结构和数字数据中可能集群数的视觉技术,经改进的版本iVAT使用基于路径的距离变换方法,以提高增值税在“拖网”案件上的效力。增值税和iVAT方法也与单一链接(SL)的等级分类算法一起使用,但是,它们对于数据集中各集群之间的噪音和桥梁点十分敏感,因此,相应的增值税/iVAT图像往往无法对此类案例作出结论。在本文中,我们建议采用基于限制的版本iVAT,我们称之为ConiVAT,以限制的形式利用背景知识,提高增值税/iVAT在具有挑战性和复杂性的数据集方面的效力。ConiVAT使用输入限制来学习基本相似性衡量标准,并建立一个最小的过渡性差异矩阵。我们展示了ConiVAT在视觉评估和九个数据集中的单一链接组合,以显示,它改进了以复杂数据类集为形式的国际增值税图像的质量,用SVAT在三个类组之间,也表明它克服了该类类的高级数据分类的过渡性结论。