Attribute reduction is one of the most important research topics in the theory of rough sets, and many rough sets-based attribute reduction methods have thus been presented. However, most of them are specifically designed for dealing with either labeled data or unlabeled data, while many real-world applications come in the form of partial supervision. In this paper, we propose a rough sets-based semi-supervised attribute reduction method for partially labeled data. Particularly, with the aid of prior class distribution information about data, we first develop a simple yet effective strategy to produce the proxy labels for unlabeled data. Then the concept of information granularity is integrated into the information-theoretic measure, based on which, a novel granular conditional entropy measure is proposed, and its monotonicity is proved in theory. Furthermore, a fast heuristic algorithm is provided to generate the optimal reduct of partially labeled data, which could accelerate the process of attribute reduction by removing irrelevant examples and excluding redundant attributes simultaneously. Extensive experiments conducted on UCI data sets demonstrate that the proposed semi-supervised attribute reduction method is promising and even compares favourably with the supervised methods on labeled data and unlabeled data with true labels in terms of classification performance.
翻译:属性的减少是粗糙数据集理论中最重要的研究课题之一,因此提出了许多粗糙数据集属性的减少方法。然而,大多数这类方法都是专门设计用于处理标签数据或未标签数据,而许多现实世界应用则以部分监督的形式出现。在本文中,我们建议对部分标签数据采用粗糙数据集半监督半监督属性的减少方法。特别是,在前类数据分配信息的帮助下,我们首先制定简单而有效的战略,为未标签数据制作代名标签。然后,信息颗粒性的概念被纳入信息理论计量中,在此基础上,提出了新的颗粒有条件的微粒诱变计量,并在理论上证明了其单一性。此外,我们提供了一种快速超常算法,以产生部分标签数据的最佳回流,从而通过删除不相关实例和同时排除冗余属性来加速属性的减少进程。对UCI数据集进行的广泛实验表明,拟议的半监督性属性减少方法很有希望,甚至与标签数据真实性能和未标签数据分类的可靠方法相比较。