This paper presents an open-source Python toolbox called Ensemble Feature Importance (EFI) to provide machine learning (ML) researchers, domain experts, and decision makers with robust and accurate feature importance quantification and more reliable mechanistic interpretation of feature importance for prediction problems using fuzzy sets. The toolkit was developed to address uncertainties in feature importance quantification and lack of trustworthy feature importance interpretation due to the diverse availability of machine learning algorithms, feature importance calculation methods, and dataset dependencies. EFI merges results from multiple machine learning models with different feature importance calculation approaches using data bootstrapping and decision fusion techniques, such as mean, majority voting and fuzzy logic. The main attributes of the EFI toolbox are: (i) automatic optimisation of ML algorithms, (ii) automatic computation of a set of feature importance coefficients from optimised ML algorithms and feature importance calculation techniques, (iii) automatic aggregation of importance coefficients using multiple decision fusion techniques, and (iv) fuzzy membership functions that show the importance of each feature to the prediction task. The key modules and functions of the toolbox are described, and a simple example of their application is presented using the popular Iris dataset.
翻译:本文介绍了一个开放源代码的Python工具箱,称为“共生功能重要性”,它向机学习研究人员、域专家和决策人员提供具有可靠和准确特点重要性的机学(ML)研究人员、域专家和决策人员提供可靠和准确的特性重要性量化和对使用模糊装置预测问题的特性重要性进行更可靠的机械化解释。开发该工具包的目的是解决因机器学习算法、特质重要性计算方法和数据依赖性的不同可得性而导致的特性重要性量化和缺乏可信赖特性重要性解释方面的不确定性。EFI将多个机学模型的结果与具有不同特点重要性的计算方法相合并,这些模型使用数据示意图、多数投票和模糊逻辑等数据串联技术计算不同特点。EFI工具箱的主要属性是:(一) 自动优化ML算法,(二) 自动计算来自优化的ML算法和特质重要性计算技术的一套特性重要性系数,(三) 利用多种决定融合技术自动汇总重要系数,以及(四) 模糊成员功能,显示每个特性对预测任务的重要性的重要性,例如平均投票和模糊工具箱应用的关键模块和功能。