Response curves exhibit the magnitude of the response of a sensitive system to a varying stimulus. However, response of such systems may be sensitive to multiple stimuli (i.e., input features) that are not necessarily independent. As a consequence, the shape of response curves generated for a selected input feature (referred to as "active feature") might depend on the values of the other input features (referred to as "passive features"). In this work, we consider the case of systems whose response is approximated using regression neural networks. We propose to use counterfactual explanations (CFEs) for the identification of the features with the highest relevance on the shape of response curves generated by neural network black boxes. CFEs are generated by a genetic algorithm-based approach that solves a multi-objective optimization problem. In particular, given a response curve generated for an active feature, a CFE finds the minimum combination of passive features that need to be modified to alter the shape of the response curve. We tested our method on a synthetic dataset with 1-D inputs and two crop yield prediction datasets with 2-D inputs. The relevance ranking of features and feature combinations obtained on the synthetic dataset coincided with the analysis of the equation that was used to generate the problem. Results obtained on the yield prediction datasets revealed that the impact on fertilizer responsivity of passive features depends on the terrain characteristics of each field.
翻译:响应曲线显示了敏感系统对变化的刺激的响应强度。然而,这种系统的响应可能对多个不一定独立的刺激(即输入特征)敏感。因此,生成选择的输入特征(称为“主动特征”)的响应曲线的形状可能取决于其他输入特征(称为“被动特征”)的值。在这项工作中,我们考虑了使用回归神经网络逼近其响应的系统的情况。我们建议使用反事实说明(CFEs)来识别神经网络黑盒生成响应曲线的形状上最相关的特征。CFEs是通过基于遗传算法的方法生成的,该方法解决多目标优化问题。特别是,给定为主动特征生成的响应曲线,CFE查找需要修改的最小被动特征的组合,以改变响应曲线的形状。我们在具有1-D输入的合成数据集和具有2-D输入的两个农作物产量预测数据集上测试了我们的方法。在合成数据集上获得的特征和特征组合的相关性排名与用于生成问题的方程的分析相符。在产量预测数据集上得到的结果表明,肥料反应性对被动特征的影响取决于每个领域的地形特征。