Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimensional sparse input spaces. In contrast, recent embedding-based methods learn compact, low-dimensional representations, potentially facilitating down-stream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifold-based embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 real-life data sets for multi-class and multi-label classification tasks. The utility of ReliefE for high-dimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index.
翻译:在高通量生物学和社会科学等机器学习应用中,广泛采用地物排序。流行的算法体系的方法通过迭代核算最接近相关和不相关的实例,赋予特征重要地位。尽管这些算法具有很高的效用,但它们在计算上费用很高,而且不适合于高维稀释输入空间。相比之下,最近嵌入的方法学习了传统学习者最紧凑的、低维的表达方式,有可能促进下游学习能力。本文探讨了如何调整算法的救济分支,以便从(里曼尼亚)多基嵌入实例和目标空间中受益。最后,通过Fuzzy Jaccar指数研究救济E与其他排序算法的关系。