Feature selection is an effective preprocessing technique to reduce data dimension. For feature selection, rough set theory provides many measures, among which mutual information is one of the most important attribute measures. However, mutual information based importance measures are computationally expensive and inaccurate, especially in hypersample instances, and it is undoubtedly a NP-hard problem in high-dimensional hyperhigh-dimensional data sets. Although many representative group intelligent algorithm feature selection strategies have been proposed so far to improve the accuracy, there is still a bottleneck when using these feature selection algorithms to process high-dimensional large-scale data sets, which consumes a lot of performance and is easy to select weakly correlated and redundant features. In this study, we propose an incremental mutual information based improved swarm intelligent optimization method (IMIICSO), which uses rough set theory to calculate the importance of feature selection based on mutual information. This method extracts decision table reduction knowledge to guide group algorithm global search. By exploring the computation of mutual information of supersamples, we can not only discard the useless features to speed up the internal and external computation, but also effectively reduce the cardinality of the optimal feature subset by using IMIICSO method, so that the cardinality is minimized by comparison. The accuracy of feature subsets selected by the improved cockroach swarm algorithm based on incremental mutual information is better or almost the same as that of the original swarm intelligent optimization algorithm. Experiments using 10 datasets derived from UCI, including large scale and high dimensional datasets, confirmed the efficiency and effectiveness of the proposed algorithm.
翻译:功能选择是一种有效的预处理技术,可以降低数据维度。 对于特性选择, 粗度设定理论提供了许多计量, 其中包括相互信息是最重要的属性计量。 但是, 以相互信息为基础的重要度计量是计算成本昂贵和不准确的, 特别是在超模实例中, 并且无疑是高维超高维数据集中NP- 硬问题。 虽然迄今为止已经提出了许多有代表性的团体智能算法特征选择战略来提高数据维度, 但是在使用这些特性选择算法处理高维度大型数据集时, 仍然有一个瓶颈, 这些数据消耗大量性能, 并且容易选择薄弱的关联性和冗余性特征。 但是, 在本研究中, 我们建议采用基于更暖智能优化的智能优化方法( IMCSO ), 来计算基于共同算法的特征选择的重要性。 这个方法提取了决定表减少知识来指导集团全球算法的搜索。 通过探索对超级样本的相互信息进行计算, 我们不仅可以丢弃用于加快内部和外部计算速度的无用特性特性特征, 并且很容易选择较弱的关联性和冗余的特性特性特性特性特性特性特性。 在这个研究中, 使用IMIMIO的精度的精度分析方法, 以更精确的精度分析方法, 以更精度为最深的精确的精确的精确性, 。