With the widespread use of machine learning to support decision-making, it is increasingly important to verify and understand the reasons why a particular output is produced. Although post-training feature importance approaches assist this interpretation, there is an overall lack of consensus regarding how feature importance should be quantified, making explanations of model predictions unreliable. In addition, many of these explanations depend on the specific machine learning approach employed and on the subset of data used when calculating feature importance. A possible solution to improve the reliability of explanations is to combine results from multiple feature importance quantifiers from different machine learning approaches coupled with re-sampling. Current state-of-the-art ensemble feature importance fusion uses crisp techniques to fuse results from different approaches. There is, however, significant loss of information as these approaches are not context-aware and reduce several quantifiers to a single crisp output. More importantly, their representation of 'importance' as coefficients is misleading and incomprehensible to end-users and decision makers. Here we show how the use of fuzzy data fusion methods can overcome some of the important limitations of crisp fusion methods.
翻译:由于广泛使用机器学习来支持决策,因此越来越有必要核实和理解产生特定产出的原因。虽然培训后的重要方法有助于这种解释,但总体而言,对于如何量化特征重要性缺乏共识,使模型预测的解释不可靠。此外,许多这些解释取决于所采用的具体机器学习方法和计算特征重要性时使用的数据子集。提高解释可靠性的一个可能解决办法是将不同机器学习方法的多重特性重要性量化器以及再抽样的结果结合起来。目前,最先进的共同特性的结合作用使用精确技术来结合不同方法的结果。然而,由于这些方法不是符合背景的,而是将若干量化因素减为单一的精确产出,因此信息大量丢失。更重要的是,它们作为系数的“进口”的表述对终端用户和决策者来说是误导和不易理解的。在这里,我们展示了使用模糊数据融合方法如何能够克服精确融合方法的一些重大局限性。