We propose a flexible ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace optimally selected from a collection of random subspaces. To conduct subspace selection, we propose a new criterion, ratio information criterion (RIC), based on weighted Kullback-Leibler divergence. The theoretical analysis includes the risk and Monte-Carlo variance of the RaSE classifier, establishing the screening consistency and weak consistency of RIC, and providing an upper bound for the misclassification rate of the RaSE classifier. In addition, we show that in a high-dimensional framework, the number of random subspaces needs to be very large to guarantee that a subspace covering signals is selected. Therefore, we propose an iterative version of the RaSE algorithm and prove that under some specific conditions, a smaller number of generated random subspaces are needed to find a desirable subspace through iteration. An array of simulations under various models and real-data applications demonstrate the effectiveness and robustness of the RaSE classifier and its iterative version in terms of low misclassification rate and accurate feature ranking. The RaSE algorithm is implemented in the R package RaSEn on CRAN.
翻译:我们提出一个灵活的混合分类框架,即随机子空间集合(RASE),用于稀疏分类。在RASE算法中,我们汇总了许多弱小学习者,每个弱学习者都是从随机子空间集合中最佳选择的子空间中训练的基础分类者。为了进行子空间选择,我们根据Kullback-Leiber加权差异,提出了一个新的标准,即比率信息标准(RICE)。理论分析包括RaSE分类者的风险和蒙特-卡罗差异,建立REC的筛选一致性和薄弱一致性,并为RASE分类者错误分类率提供一个上限。此外,我们表明,在高维框架内,随机子空间的数量需要非常大,才能保证包含信号的子空间被选定。因此,我们建议了RSE值算法的迭接版,并证明在某些特定条件下,需要较少生成的随机子空间才能通过迭代法找到合适的子空间。在各种模型和实体数据应用下的一系列模拟显示RASE分类的精确性和稳健性。在RASERA级中,其最低的RSERA值等级等级和代号的RSEARIGISA中,其最低等级是RSEARIGRIGI的精确等级等级等级。