AI可解释性文献列表


导读

奥本大学的Anh M. Nguyen在Github上发布了关于AI可解释性文献的汇总,包含工具库、综述、近几年来各大会议期刊上发表的关于可解释性的相关工作等。


作者 | Anh M. Nguyen

编译 | Xiaowen


https://github.com/anguyen8/XAI-papers

GUI tools

  • DeepVis: Deep Visualization Toolbox. Yosinski et al. 2015 code | pdf 

    • https://github.com/yosinski/deep-visualization-toolbox

    • http://yosinski.com/deepvis

  • SWAP: Generate adversarial poses of objects in a 3D space. Alcorn et al. 2018 code | pdf

    • https://github.com/airalcorn2/strike-with-a-pose

    • https://arxiv.org/abs/1811.11553


    Libraries

  • CNN visualizations (activation maximization, PyTorch)

    • https://github.com/utkuozbulak/pytorch-cnn-visualizations

  • iNNvestigate (heatmaps, Keras)

    • https://github.com/albermax/innvestigate

  • DeepExplain (heatmaps, Keras)

    • https://github.com/marcoancona/DeepExplain

  • Lucid (activation maximization, heatmaps, Tensorflow) 

    • https://github.com/tensorflow/lucid


    Surveys

  • Methods for Interpreting and Understanding Deep Neural Networks. Montavon et al. 2017 pdf

    • https://arxiv.org/pdf/1706.07979.pdf

  • Visualizations of Deep Neural Networks in Computer Vision: A Survey. Seifert et al. 2017 pdf

    • https://link.springer.com/chapter/10.1007/978-3-319-54024-5_6

  • How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin et al. 2018 pdf

    • https://arxiv.org/abs/1804.11191

  • A brief survey of visualization methods for deep learning models from the perspective of Explainable AI. Chalkiadakis 2018 pdf

    • https://www.macs.hw.ac.uk/~ic14/IoannisChalkiadakis_RRR.pdf

  • A Survey Of Methods For Explaining Black Box Models. Guidotti et al. 2018 pdf

    • https://arxiv.org/pdf/1802.01933.pdf

  • Understanding Neural Networks via Feature Visualization: A survey. Nguyen et al. 2019 pdf

    • https://arxiv.org/pdf/1904.08939.pdf

  • Explaining Explanations: An Overview of Interpretability of Machine Learning. Gilpin et al. 2019 pdf 

    • https://arxiv.org/pdf/1806.00069.pdf

Definitions of Interpretability

  • The Mythos of Model Interpretability. Lipton 2016 pdf

    • https://arxiv.org/abs/1606.03490

  • Towards A Rigorous Science of Interpretable Machine Learning. Doshi-Velez & Kim. 2017 pdf

    • https://arxiv.org/pdf/1702.08608.pdf

  • Interpretable machine learning: definitions, methods, and applications. Murdoch et al. 2019 pdf 

    • https://arxiv.org/pdf/1901.04592v1.pdf


    Books

  • A Guide for Making Black Box Models Explainable. Molnar 2019 pdf 

    • https://christophm.github.io/interpretable-ml-book/


    A. Explaining inner-workings

    A1. Visualizing Preferred Stimuli
    Synthesizing images / Activation Maximization

  • AM: Visualizing higher-layer features of a deep network. Erhan et al. 2009 pdf

    • https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network

  • Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013pdf

    • https://arxiv.org/pdf/1312.6034.pdf

  • DeepVis: Understanding Neural Networks through Deep Visualization. Yosinski et al. 2015 pdf | url

    • http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf

    • http://yosinski.com/deepvis

  • MFV: Multifaceted Feature Visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Nguyen et al. 2016 pdf | code

    • http://www.evolvingai.org/files/mfv_icml_workshop_16.pdf

    • https://github.com/Evolving-AI-Lab/mfv

  • DGN-AM: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen et al. 2016 pdf | code

    • https://github.com/anguyen8/XAI-papers/blob/master/anhnguyen.me/project/synthesizing

    • https://github.com/Evolving-AI-Lab/synthesizing

  • PPGN: Plug and Play Generative Networks. Nguyen et al. 2017 pdf | code

    • https://github.com/anguyen8/XAI-papers/blob/master/anhnguyen.me/project/ppgn

    • https://github.com/Evolving-AI-Lab/ppgn

  • Feature Visualization. Olah et al. 2017 url

    • https://distill.pub/2017/feature-visualization

  • Diverse feature visualizations reveal invariances in early layers of deep neural networks. Cadena et al. 2018 pdf

    • https://arxiv.org/pdf/1807.10589.pdf

  • Computer Vision with a Single (Robust) Classifier. Santurkar et al. 2019 pdf | blog | code 

    • https://arxiv.org/abs/1906.09453 

    • http://gradsci.org/robust_apps 

    • https://github.com/MadryLab/robustness_applications


    Real images / Segmentation Masks

  • Visualizing and Understanding Recurrent Networks. Kaparthey et al. 2015 pdf

    • https://arxiv.org/abs/1506.02078

  • Object Detectors Emerge in Deep Scene CNNs. Zhou et al. 2015 pdf

    • https://arxiv.org/abs/1412.6856

  • Understanding Deep Architectures by Interpretable Visual Summaries pdf 

    • https://arxiv.org/pdf/1801.09103.pdf


    A2. Inverting Neural Networks

  • Understanding Deep Image Representations by Inverting Them pdf

    • https://arxiv.org/abs/1412.0035

  • Inverting Visual Representations with Convolutional Networks pdf

    • https://arxiv.org/abs/1506.02753

  • Neural network inversion beyond gradient descent pdf 

    • http://opt-ml.org/papers/OPT2017_paper_38.pdf


    A3. Distilling DNNs into more interpretable models

  • Interpreting CNNs via Decision Trees pdf

    • https://arxiv.org/abs/1802.00121

  • Distilling a Neural Network Into a Soft Decision Tree pdf

    • https://arxiv.org/abs/1711.09784

  • Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et al. 2018 pdf

    • https://arxiv.org/abs/1710.06169

  • Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. Liu et al. 2018 pdf 

    • https://arxiv.org/pdf/1812.10924.pdf


    A4. Quantitatively characterizing hidden features

  • TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Kim et al. 2018 

    • pdf  https://arxiv.org/abs/1711.11279 | 

    • code https://github.com/tensorflow/tcav

    • Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks. Ghorbani et al. 2019 pdf 

      • https://arxiv.org/abs/1902.03129

  • SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Raghu et al. 2017 pdf | code

    • https://arxiv.org/abs/1706.05806

    • https://github.com/google/svcca

  • A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens. Saini et al. 2018 pdf

    • https://arxiv.org/abs/1806.02012

  • Network Dissection: Quantifying Interpretability of Deep Visual Representations. Bau et al. 2017 url | pdf

    • http://netdissect.csail.mit.edu/

    • http://netdissect.csail.mit.edu/final-network-dissection.pdf

    • GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. Bau et al. 2018 pdf 

      • https://arxiv.org/abs/1811.10597

    • Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks. Fong & Vedaldi 2018 pdf 

      • https://arxiv.org/abs/1801.03454


      A5. Network surgery

  • How Important Is a Neuron? Dhamdhere et al. 2018 pdf 

    • https://arxiv.org/pdf/1805.12233.pdf


    A6. Sensitivity analysis

  • NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models. Liu et al. 2018 pdf 

    • http://www.sci.utah.edu/~shusenl/publications/paper_entailVis.pdf

B. Decision explanations

B1. Heatmaps

B1.1 White-box / Gradient-based

  • A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks pdf 

    • https://arxiv.org/pdf/1606.07757.pdf

Gradient

  • Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013pdf 

    • https://arxiv.org/pdf/1312.6034.pdf

  • Deconvnet: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf 

    • https://arxiv.org/pdf/1311.2901.pdf

  • Guided-backprop: Striving for simplicity: The all convolutional net. Springenberg et al. 2015 pdf 

    • http://arxiv.org/pdf/1412.6806.pdf

Input x Gradient

  • DeepLIFT: Learning important features through propagating activation differences. Shrikumar et al. 2017 pdf 

    • https://arxiv.org/pdf/1605.01713.pdf

  • Integrated Gradients: Axiomatic Attribution for Deep Networks. Sundararajan et al. 2018 pdf | code 

    • http://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf 

    • https://github.com/ankurtaly/Integrated-Gradients

  • I-GOR: Visualizing Deep Networks by Optimizing with Integrated Gradients. Qi et al. 2019 pdf 

    • https://arxiv.org/pdf/1905.00954.pdf

  • LRP: Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation pdf 

    • https://arxiv.org/abs/1712.08268

    • DTD: Explaining NonLinear Classification Decisions With Deep Tayor Decomposition pdf 

      • https://arxiv.org/abs/1512.02479

Activation map

  • CAM: Learning Deep Features for Discriminative Localization. Zhou et al. 2016 code | web 

    • https://github.com/metalbubble/CAM 

    • http://cnnlocalization.csail.mit.edu/

  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju et al. 2017 pdf 

    • https://arxiv.org/abs/1610.02391

  • Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. Chattopadhyay et al. 2017 pdf | code 

    • https://arxiv.org/abs/1710.11063 

    • https://github.com/adityac94/Grad_CAM_plus_plus

  • Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models. Omeiza et al. 2019 pdf 

    • https://arxiv.org/pdf/1908.01224.pdf

Learning the heatmap

  • Interpretable Explanations of Black Boxes by Meaningful Perturbation. Fong et al. 2017 pdf 

    • http://openaccess.thecvf.com/content_ICCV_2017/papers/Fong_Interpretable_Explanations_of_ICCV_2017_paper.pdf

  • FIDO: Explaining image classifiers by counterfactual generation. Chang et al. 2019 pdf 

    • https://arxiv.org/pdf/1807.08024.pdf

  • FG-Vis: Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks. Wagner et al. 2019 pdf 

    • http://openaccess.thecvf.com/content_CVPR_2019/papers/Wagner_Interpretable_and_Fine-Grained_Visual_Explanations_for_Convolutional_Neural_Networks_CVPR_2019_paper.pdf

Others

  • Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks. Oramas et al. 2019 pdf 

    • https://arxiv.org/pdf/1712.06302.pdf

  • Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks. Seo et al. 2018 pdf 

    • https://arxiv.org/pdf/1807.11720.pdf

B1.2 Black-box / Perturbation-based

  • Occlusion: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf 

    • https://arxiv.org/pdf/1311.2901.pdf

  • PDA: Visualizing deep neural network decisions: Prediction difference analysis. Zintgraf et al. 2017 pdf 

    • https://arxiv.org/pdf/1702.04595.pdf

  • RISE: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. 2018 pdf 

    • https://arxiv.org/pdf/1806.07421.pdf

  • LIME: Why should i trust you?: Explaining the predictions of any classifier. Ribeiro et al. 2016 pdf | blog 

    • https://arxiv.org/pdf/1602.04938.pdf 

    • https://homes.cs.washington.edu/~marcotcr/blog/lime/

  • SHAP: A Unified Approach to Interpreting Model Predictions. Lundberg et al. 2017 pdf | code 

    • https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf 

    • https://github.com/slundberg/shap

  • OSFT: Interpreting Black Box Models via Hypothesis Testing. Burns et al. 2019 pdf 

    • https://arxiv.org/pdf/1904.00045.pdf

B1.3 Evaluating heatmaps

  • The (Un)reliability of saliency methods. Kindermans et al. 2018 pdf 

    • https://openreview.net/forum?id=r1Oen--RW

  • Sanity Checks for Saliency Maps. Adebayo et al. 2018 pdf 

    • http://papers.nips.cc/paper/8160-sanity-checks-for-saliency-maps.pdf

  • A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. Nie et al. 2018 pdf 

    • https://arxiv.org/abs/1805.07039

  • BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth. Yang et al. 2019 pdf 

    • https://arxiv.org/abs/1907.09701

  • On the (In)fidelity and Sensitivity for Explanations. Yeh et al. 2019 pdf 

    • https://arxiv.org/pdf/1901.09392.pdf

B2. Learning to explain

  • Learning how to explain neural networks: PatternNet and PatternAttribution pdf 

    • https://arxiv.org/abs/1705.05598

  • Deep Learning for Case-Based Reasoning through Prototypes pdf 

    • https://arxiv.org/pdf/1710.04806.pdf

  • Unsupervised Learning of Neural Networks to Explain Neural Networks pdf 

    • https://arxiv.org/abs/1805.07468

  • Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions pdf 

    • https://arxiv.org/abs/1901.03729

    • Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations pdf 

      • https://arxiv.org/pdf/1702.07826.pdf

  • Towards robust interpretability with self-explaining neural networks. Alvarez-Melis and Jaakola 2018 pdf 

    • http://people.csail.mit.edu/tommi/papers/SENN_paper.pdf

C. Counterfactual explanations

  • Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections. Zhang et al. 2018 pdf 

    • http://papers.nips.cc/paper/7736-interpreting-neural-network-judgments-via-minimal-stable-and-symbolic-corrections.pdf

  • Counterfactual Visual Explanations. Goyal et al. 2019 pdf 

    • https://arxiv.org/pdf/1904.07451.pdf

D. Others

  • Yang, S. C. H., & Shafto, P. Explainable Artificial Intelligence via Bayesian Teaching. NIPS 2017 pdf 

  • http://shaftolab.com/assets/papers/yangShafto_NIPS_2017_machine_teaching.pdf

  • Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation pdf 

    • http://www.antoniosliapis.com/papers/explainable_ai_for_designers.pdf

  • ICADx: Interpretable computer aided diagnosis of breast masses. Kim et al. 2018 pdf 

    • https://arxiv.org/abs/1805.08960

  • Neural Network Interpretation via Fine Grained Textual Summarization. Guo et al. 2018 pdf 

    • https://arxiv.org/pdf/1805.08969.pdf

  • LS-Tree: Model Interpretation When the Data Are Linguistic. Chen et al. 2019 pdf 

    • https://arxiv.org/abs/1902.04187



-END-

专 · 知


专知,专业可信的人工智能知识分发,让认知协作更快更好!欢迎登录www.zhuanzhi.ai,注册登录专知,获取更多AI知识资料!

欢迎微信扫一扫加入专知人工智能知识星球群,获取最新AI专业干货知识教程视频资料和与专家交流咨询

请加专知小助手微信(扫一扫如下二维码添加),加入专知人工智能主题群,咨询技术商务合作~

专知《深度学习:算法到实战》课程全部完成!560+位同学在学习,现在报名,限时优惠!网易云课堂人工智能畅销榜首位!

点击“阅读原文”,了解报名专知《深度学习:算法到实战》课程

展开全文
Top
微信扫码咨询专知VIP会员