Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.
翻译:一种概念瓶颈可以增强解释性,因为可以对它进行调查,以了解在输入中“看到”的模式是什么概念,而认为这些概念中哪些是重要的概念。然而,建立信任措施在实践中是限制性的,因为它们需要培训数据中密集的概念说明,以了解瓶颈。此外,建立信任措施往往不匹配一个不受限制的神经网络的准确性,减少在实践中部署这些网络的动力。在这项工作中,我们通过引入“P-Hoc 概念瓶颈”模型(P-BBIS)来解决建立信任措施的这些局限性。我们表明,我们可以将任何神经网络转化为PCM,而不牺牲模型性能,同时保留解释性效益。当培训数据没有提供概念说明时,我们表明PCM可以通过多式联运模型将概念从其他数据集或概念的自然语言模型描述中转移概念。PCBM的主要好处是,它能够使用户快速调试和更新模型,通过P-BIS标准模型来减少令人反感性关系的关联性关系,并改进P-BCM标准水平,我们可以通过以往的统计方法来显示具体的绩效。