The ability to explain decisions made by machine learning models remains one of the most significant hurdles towards widespread adoption of AI in highly sensitive areas such as medicine, cybersecurity or autonomous driving. Great interest exists in understanding which features of the input data prompt model decision making. In this contribution, we propose a novel approach to identify relevant features of the input data, inspired by methods from the energy landscapes field, developed in the physical sciences. By identifying conserved weights within groups of minima of the loss landscapes, we can identify the drivers of model decision making. Analogues to this idea exist in the molecular sciences, where coordinate invariants or order parameters are employed to identify critical features of a molecule. However, no such approach exists for machine learning loss landscapes. We will demonstrate the applicability of energy landscape methods to machine learning models and give examples, both synthetic and from the real world, for how these methods can help to make models more interpretable.
翻译:机器学习模型决策解释能力的提高,仍然是广泛应用人工智能在高度敏感领域(如医疗、网络安全或自动驾驶)中面临的最重要障碍之一。关于理解触发模型决策的输入数据的特征的兴趣极大。在本文中,我们提出了一种新方法来识别输入数据的相关特征,该方法受到物理科学领域的能量景观中的方法的启发。通过确定损失景观中的最小值组内的守恒重量,我们可以确定模型决策制定的驱动器。在分子科学中存在此类方法的类似物,其中使用坐标不变量或序参数来识别分子的关键特征。但是,机器学习损失景观中不存在这样的方法。我们将展示应用能量景观方法于机器学习模型的可行性,并给出示例,包括从真实世界中的合成及样例,以展示这些方法可以如何帮助使模型更加可解释。