This paper proposes a canonical-correlation-based filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the ranking criterion in greedy search. The supporting theorems developed for the feature selection method are fundamental to the understanding of the canonical correlation analysis. In empirical studies, a synthetic dataset is used to demonstrate the speed advantage of the proposed method, and eight real datasets are applied to show the effectiveness of the proposed feature ranking criterion in both classification and regression. The results show that the proposed method is considerably faster than the definition-based method, and the proposed ranking criterion is competitive compared with the seven mutual-information-based criteria.
翻译:本文为特征选择提出了一种基于胆量-关系过滤法。 平方胆量相关系数之和被采纳为特征排名标准。 提议的方法提高了贪婪搜索中排名标准的计算速度。 为特征选择方法开发的支持性理论对于理解胆量相关性分析至关重要。 在经验研究中, 合成数据集被用来显示拟议方法的优势, 并应用了八个真实数据集来显示分类和回归中的拟议特征排序标准的有效性。 结果表明,拟议方法比基于定义的方法要快得多, 拟议的排序标准与基于信息的七项标准相比具有竞争力。