Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require concept labels in the training data to learn the bottleneck and do not leverage strong pretrained models. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address the limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining interpretability benefits. When concept annotation is not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts. PCBM also enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new (potentially different) data. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using any data from the target domain or model retraining.
翻译:概念瓶装模型(BBS)将输入的内容映射到一套可解释的概念(“瓶颈”)中,并使用概念来作出预测。 概念瓶颈可以提高解释性,因为可以对概念的可解释性进行调查,以了解在输入中“看到”的模式是什么概念,这些概念中哪些被认为重要。然而,建立信任措施在实践中是限制性的,因为它们要求培训数据中的概念标签来学习瓶颈,而不是利用强大的预先培训模型。此外,建立信任措施往往与不受限制的神经网络(“瓶颈” )的准确性不匹配,降低了实际部署这些网络的动力。在这项工作中,我们通过引入后热概念瓶瓶式模型模型模型(PBCS)模型(PBS)模型解决建立信任措施的局限性。我们表明,我们可以在不牺牲模型性能的同时将任何神经网络转化为PCM模型,同时保留可解释性效益。当培训数据没有提供概念说明时,我们表明PCMM可以从其他数据集或自然语言概念描述中转移概念。 PCM还能够快速调试用模型来减少刺激性的相关性,通过新的模型改进模型,并且通过新的域域点化目标反馈,我们可以提供重要的业绩水平的数据。