Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to improve the model's performance. However, existing concept bottleneck models are unable to find optimal compromises between high task accuracy, robust concept-based explanations, and effective interventions on concepts -- particularly in real-world conditions where complete and accurate concept supervisions are scarce. To address this, we propose Concept Embedding Models, a novel family of concept bottleneck models which goes beyond the current accuracy-vs-interpretability trade-off by learning interpretable high-dimensional concept representations. Our experiments demonstrate that Concept Embedding Models (1) attain better or competitive task accuracy w.r.t. standard neural models without concepts, (2) provide concept representations capturing meaningful semantics including and beyond their ground truth labels, (3) support test-time concept interventions whose effect in test accuracy surpasses that in standard concept bottleneck models, and (4) scale to real-world conditions where complete concept supervisions are scarce.
翻译:部署AI动力系统需要可靠的模型,支持有效的人类互动,超越原始预测的准确性。概念瓶颈模型通过将分类任务设定在中间层次的类似人类的概念概念上,促进信任性。这样可以使人类干预能够纠正错误设想的概念来改进模型的性能。然而,现有的概念瓶颈模型无法在高任务准确性、强有力的基于概念的解释和概念的有效干预之间找到最佳的妥协,特别是在缺乏完整和准确概念监督的现实世界条件下;为了解决这个问题,我们提议概念嵌入模型,即超越当前精确性V型易理解性概念的新型概念瓶颈模型,通过学习可解释的高维度概念表达方式促进信任性。我们的实验表明,概念嵌入模型(1)在没有概念的情况下实现更好或竞争性的任务准确性标准神经模型,(2) 提供概念表述,捕捉有意义的语义学,包括和除地面真相标签之外,(3) 支持测试时间概念干预,其效果超过标准概念瓶式模型中的精确性,(4) 向现实世界的尺度,完全概念监督是缺乏的。