Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clustering numerical datasets have not been gained enough attention. Moreover, some literature mentions that ontology-based clustering can produce either high quality or low-quality clusters from a dataset. Therefore, in this paper we present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset using domain ontology and to produce high quality clusters. For every dataset, we produce three datasets using domain ontology. We then cluster these datasets using a genetic algorithm-based clustering technique called GenClust++. The clusters of each dataset are evaluated in terms of Sum of Squared-Error (SSE). We use six numerical datasets to evaluate the performance of our ontology-based approach. The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.
翻译:领域本体聚类已经在最近几年中受到了关注,由于本体的潜在优势,使得本体聚类的方法主要应用于减少文本文档聚类中的属性维度。使用本体减少属性维度可以产生高质量的数据集聚类。然而,本体聚类方法在聚类数字数据集方面还没有得到足够的关注。此外,一些文献提到,本体聚类可能会从数据集中产生高质量或低质量的聚类。因此,本文提出了一种基于领域本体的聚类方法,该方法通过使用领域本体来减少数字数据集中的属性维度,并生成高质量的聚类。对于每个数据集,我们使用领域本体生成三个数据集。然后使用遗传算法聚类技术GenClust++对这些数据集进行聚类。分别从聚类的角度和平方误差(SSE)的角度对每个数据集的聚类进行评估。我们使用了六个数字数据集来评估我们的本体聚类的性能。实验结果表明,本文方法的聚类质量从领域本体较低的级别逐渐提高到较高的级别。