Both in the domains of Feature Selection and Interpretable AI, there exists a desire to `rank' features based on their importance. Such feature importance rankings can then be used to either: (1) reduce the dataset size or (2) interpret the Machine Learning model. In the literature, however, such Feature Rankers are not evaluated in a systematic, consistent way. Many papers have a different way of arguing which feature importance ranker works best. This paper fills this gap, by proposing a new evaluation methodology. By making use of synthetic datasets, feature importance scores can be known beforehand, allowing more systematic evaluation. To facilitate large-scale experimentation using the new methodology, a benchmarking framework was built in Python, called fseval. The framework allows running experiments in parallel and distributed over machines on HPC systems. By integrating with an online platform called Weights and Biases, charts can be interactively explored on a live dashboard. The software was released as open-source software, and is published as a package on the PyPi platform. The research concludes by exploring one such large-scale experiment, to find the strengths and weaknesses of the participating algorithms, on many fronts.
翻译:在地物选择和可解释的AI这两个领域,都存在根据重要性来“评分”地物的愿望。然后,这些地物重要等级可用于:(1) 缩小数据集大小或(2) 解释机器学习模式。然而,在文献中,没有系统、一致地评价这种地物排名者。许多论文用不同的方式争论什么是地物排名的最佳标准。本文通过提出新的评价方法填补了这一差距。通过使用合成数据集,可以事先知道地物重要评分,从而能够进行更系统的评价。为了便利使用新方法进行大规模试验,在Python(称为Fseval)建立了一个基准框架。该框架允许在HPC系统中平行进行实验并分布在机器上。通过与称为Weights和Biases的在线平台整合,可以对图表进行互动探讨,该软件作为开放源软件发布,并作为PyPi平台上的一揽子软件发布。研究结论是探索一个大型实验,从许多方面寻找参与算法的强弱。