Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks. These risks have motivated the field of eXplainable Artificial Intelligence (XAI) to improve trust by evaluating local interpretability in black-box neural networks. Unfortunately, the ground truth is unavailable for the model's decision, so evaluation is limited to qualitative assessment. Further, interpretability may lead to inaccurate conclusions about the model or a false sense of trust. We propose to improve XAI from the vantage point of the user's trust by exploring a black-box model's latent feature space. We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes. A user explores the manifold by perturbing the input features of a query sample and recording the response for a subset of exemplars from any class. Our approach is the first locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks. We compare ProtoShotXAI to the state-of-the-art XAI approaches on MNIST, Omniglot, and ImageNet to demonstrate, both quantitatively and qualitatively, that ProtoShotXAI provides more flexibility for model exploration. Finally, ProtoShotXAI also demonstrates novel explainabilty and detectabilty on adversarial samples.
翻译:无法解释的黑盒模型创造了异常现象导致有害反应从而造成不可接受的风险的情景。这些风险促使了电子可复制人工智能(XAI)领域,通过评估黑箱神经网络中的本地可解释性来增进信任。 不幸的是,模型决定时没有地面真相,因此评价仅限于质量评估。此外,可解释性可能导致关于模型的不准确结论或虚假信任感。我们提议通过探索黑箱模型的潜在特征空间,从用户信任的有利点上改进XAI。我们将ProtoShotShotXAI(ProtoShotXAI)与不同类别非线性特征之间的对比式微小网络(Protochot-hotXAI)(Protomodical-maritive plical-IAIAIAI)(同时通过对查询样本的输入特征特征进行审视,记录任何类别外观的响应。我们的方法是第一个可本地解释的 XAI(XIAI)模型模型模型模型,可以扩展并在少数图像网络上展示。我们把ProtoXAI(ProtoIST-IAVI)和AIA(O-IAxli-IAIA)最后演示A(O-IL-IAR-IA)和A)的透明-IA(O-ILAIA(O-IAIA)和AIA(O-IA)的透明性)的透明性),可以展示。