Feature construction can contribute to comprehensibility and performance of machine learning models. Unfortunately, it usually requires exhaustive search in the attribute space or time-consuming human involvement to generate meaningful features. We propose a novel heuristic approach for reducing the search space based on aggregation of instance-based explanations of predictive models. The proposed Explainable Feature Construction (EFC) methodology identifies groups of co-occurring attributes exposed by popular explanation methods, such as IME and SHAP. We empirically show that reducing the search to these groups significantly reduces the time of feature construction using logical, relational, Cartesian, numerical, and threshold num-of-N and X-of-N constructive operators. An analysis on 10 transparent synthetic datasets shows that EFC effectively identifies informative groups of attributes and constructs relevant features. Using 30 real-world classification datasets, we show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets. Finally, EFC generated interpretable features on a real-world problem from the financial industry, which were confirmed by a domain expert.
翻译:不幸的是,它通常要求彻底搜索属性空间或耗费时间的人类参与,以产生有意义的特征。我们提议根据对预测模型的基于实例的解释汇总来减少搜索空间的新做法。提议的可解释性构建方法确定了通过大众解释方法(如IME和SHAP)暴露的共同共生属性组。我们从经验上表明,减少对这些组群的搜索会大大缩短地缩短地貌构建的时间,使用逻辑、关系、喀尔提亚、数字和N+++N和X-建设性操作员的临界值。我们对10个透明的合成数据集的分析表明,EFC有效地确定了信息性属性组和构建了相关特征。利用30个真实世界分类数据集,我们显示了一些分类师的分类准确性,并展示了即使对于大型数据集,拟议的地貌构建的可行性。最后,EFC生成了金融行业关于真实世界问题的可解释性特征,这得到了域专家的确认。