The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (\textit{i}) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (\textit{ii}) the local level, which aims to detail on how important a feature is to a particular individual in the dataset. In this work, we present the ``GlObal And Local Score'' (GOALS) operator: a simple \textit{post hoc} approach to simultaneously assess local and global feature variable importance in nonlinear models. Motivated by problems in statistical genetics, we demonstrate our approach using Gaussian process regression where understanding how genetic markers affect trait architecture both among individuals and across populations is of high interest. With detailed simulations and real data analyses, we illustrate the flexible and efficient utility of GOALS over state-of-the-art variable importance strategies.
翻译:解释机器学习模型的能力随着其在数据科学中的使用不断提高而变得日益重要。 多数目前的可解释性方法被优化,以同时评估非线性模型中本地和全球地物不同重要性的简单方法(\ textit{i})全球规模,目标是根据特征对观测人口整体差异的贡献进行排名,或(\ textit{ii})地方一级,目的是详细说明数据集中某个特征对某个特定个人的重要性。在这项工作中,我们介绍了“GOALS”操作器:一种简单的方式,用以同时评估非线性模型中本地和全球地物不同的重要性。我们受统计遗传学问题的影响,我们展示了我们的方法,使用高斯进程回归法,了解基因标志如何影响个人之间和整个人口之间的特征结构,这是非常感兴趣的。我们通过详细的模拟和真实的数据分析,展示了GOALS对最新变异重要性战略的灵活和高效效用。