When processing high-dimensional datasets, a common pre-processing step is feature selection. Filter-based feature selection algorithms are not tailored to a specific classification method, but rather rank the relevance of each feature with respect to the target and the task. This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks. We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task. The proposed graph-based algorithm is constructed by combing the Jeffries-Matusita distance with a non-linear dimension reduction method, diffusion maps. Feature elimination is performed based on the distribution of the features in the low-dimensional space. Then, a very small number of feature that have complementary separation strengths, are selected. Moreover, the low-dimensional embedding allows to visualize the feature space. Experimental results are provided for public datasets and compared with known filter-based feature selection techniques.
翻译:在处理高维数据集时,一个共同的预处理步骤是特性选择。基于过滤的特性选择算法不是根据特定的分类方法定制的,而是根据目标和任务排列每个特性的适切性。这项工作侧重于一个适合多级分类任务的基于图形的筛选特性选择方法。我们的目标是大量减少选定特性的数量,以便绘制原始数据的草图,为分类任务编码有价值的信息。拟议的基于图形的算法是用非线性尺寸减少法(扩散图)对杰弗里-马塔西塔距离进行梳理的。根据低维空间特征的分布进行特性清除。然后,选择了非常少量具有互补分离优势的特性。此外,低维嵌入使特性空间能够直观化。为公共数据集提供实验结果,并与已知的基于过滤的特性选择技术进行比较。</s>