Explainability in yield prediction helps us fully explore the potential of machine learning models that are already able to achieve high accuracy for a variety of yield prediction scenarios. The data included for the prediction of yields are intricate and the models are often difficult to understand. However, understanding the models can be simplified by using natural groupings of the input features. Grouping can be achieved, for example, by the time the features are captured or by the sensor used to do so. The state-of-the-art for interpreting machine learning models is currently defined by the game-theoretic approach of Shapley values. To handle groups of features, the calculated Shapley values are typically added together, ignoring the theoretical limitations of this approach. We explain the concept of Shapley values directly computed for predefined groups of features and introduce an algorithm to compute them efficiently on tree structures. We provide a blueprint for designing swarm plots that combine many local explanations for global understanding. Extensive evaluation of two different yield prediction problems shows the worth of our approach and demonstrates how we can enable a better understanding of yield prediction models in the future, ultimately leading to mutual enrichment of research and application.
翻译:在产量预测可解释性方面有着极高准确性的机器学习模型的潜力需要得到充分探索。产量预测所需的数据复杂多样,因此往往难以理解其背后的模型。然而,通过自然分组输入特征,我们可以简化对这些模型的理解。分组可以通过输入特征产生的时间或传感器使用情况等方式实现。目前解释机器学习模型的最新方法是由Shapley值这一博弈论方法论定义的。为了处理特征组,当前通常采用对计算得到的Shapley值相加的方式,忽略了这种做法的理论局限性。本文阐述了直接计算预定义特征组的Shapley值的概念,并且介绍了一种在树结构上高效计算Shapley值的算法。通过提供融合许多局部解释以得出全局结论的“Swarm Plots”设计蓝图,我们为将来设计更好的产量预测模型带来了启示。在两个不同的产量预测问题上进行了广泛评估,证明了我们的方法的有效性,并展示了如何更好地理解产量预测模型,这将最终促进互相的研究和应用的共同发展。