Fuzzy rough sets are well-suited for working with vague, imprecise or uncertain information and have been succesfully applied in real-world classification problems. One of the prominent representatives of this theory is fuzzy-rough nearest neighbours (FRNN), a classification algorithm based on the classical k-nearest neighbours algorithm. The crux of FRNN is the indiscernibility relation, which measures how similar two elements in the data set of interest are. In this paper, we investigate the impact of this indiscernibility relation on the performance of FRNN classification. In addition to relations based on distance functions and kernels, we also explore the effect of distance metric learning on FRNN for the first time. Furthermore, we also introduce an asymmetric, class-specific relation based on the Mahalanobis distance which uses the correlation within each class, and which shows a significant improvement over the regular Mahalanobis distance, but is still beaten by the Manhattan distance. Overall, the Neighbourhood Components Analysis algorithm is found to be the best performer, trading speed for accuracy.
翻译:模糊的粗略图组非常适合使用模糊、不准确或不确定的信息,并被应用于真实世界的分类问题。这一理论的著名代表之一是模糊的近邻(FRNN),这是一种基于古典K-近邻算法的分类算法。FRNN的柱石是不可分的关系,它测量了数据集中相似的两个元素。在本文中,我们研究了这种不可分性关系对FRNN分类工作的影响。除了基于远程函数和内核的关系外,我们还首次探索了远程指标学习对FRNN(FNN)的影响。此外,我们还采用了基于马哈拉诺比斯距离的不对称的班级特级关系,它使用每个班级内部的关联性,它表明正常的Mahalanobis距离有了显著的改善,但仍然受到曼哈顿距离的打击。总体来说,邻里成分分析算法被认为是最佳的性、交易速度的准确性。