The population-based optimization algorithms have provided promising results in feature selection problems. However, the main challenges are high time complexity. Moreover, the interaction between features is another big challenge in FS problems that directly affects the classification performance. In this paper, an estimation of distribution algorithm is proposed to meet three goals. Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function and evolve during the algorithm, based on our proposed update procedure. Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration. As a result, the number of selected features of the final solution will be optimized during the evolution process. The two mentioned advantages can increase the convergence speed of the algorithm. Thirdly, as the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features. Thus, it can deal with complementary features and consequently increase classification performance. To do this, we provide a conditional probability scheme that considers the joint probability distribution of selecting two features. The introduced probabilities successfully detect correlated features. Experimental results on a synthetic dataset with correlated features prove the performance of our proposed approach facing these types of features. Furthermore, the results on 13 real-world datasets obtained from the UCI repository show the superiority of the proposed method in comparison with some state-of-the-art approaches.
翻译:以人口为基础的优化算法为特征选择问题提供了有希望的结果。然而,主要的挑战在于时间的复杂性。此外,各种特征之间的相互作用是金融服务中直接影响分类绩效的另一个重大挑战。在本文件中,提出分配算法的估算是为了达到三个目标。首先,作为EDA的延伸,拟议的方法在每一个迭代中只产生两个人,根据我们提议的更新程序,根据健身功能进行竞争,并在算法中演进。第二,我们提供了一种确定每个迭代中个人特征数目的指导性方法。因此,最终解决方案中某些特征的数量将在演进过程中得到优化。提到的两种优势可以提高算法的趋同速度。第三,作为文件的主要贡献,除了单独考虑每个特征的重要性外,拟议的方法还可以考虑各特征之间的相互作用。因此,它可以处理互补特征,从而提高分类性。为了做到这一点,我们提供了一种有条件的概率计划,即考虑选择两种特征的共同概率分布。因此,引入的概率将在演进过程中得到优化。提到的两种特征的概率将提高算法的相对性特征。第三,作为文件的主要贡献,除了仅考虑每个特征的重要性之外,拟议的方法还能够用13个合成数据流数据流的方法展示我们的拟议结果。