As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.
翻译:随着机器学习黑盒越来越多地部署在保健和刑事司法等关键领域,人们越来越强调开发技术,以便以事后临时方式解释这些黑盒。在这项工作中,我们分析了两种受欢迎的后临时解释技术:平滑格拉德(一种基于梯度的方法)和LIME的变种(一种基于扰动的方法)。更具体地说,我们为这两种方法的解释输出得出明确的封闭形式表达方式,并表明它们都与预期的解释一致,即当这些方法使用的受扰动的样品数量很大时。然后我们利用这种联系来为这些技术建立其他可取的特性,例如坚固性。我们还得出了这些方法所需的扰动次数与预期解释一致的有限抽样复杂界限。最后,我们用对合成和真实世界数据集的广泛实验,对我们的理论进行了经验验证。