The Concept Bottleneck Models (CBMs) of Koh et al. [2020] provide a means to ensure that a neural network based classifier bases its predictions solely on human understandable concepts. The concept labels, or rationales as we refer to them, are learned by the concept labeling component of the CBM. Another component learns to predict the target classification label from these predicted concept labels. Unfortunately, these models are heavily reliant on human provided concept labels for each datapoint. To enable CBMs to behave robustly when these labels are not readily available, we show how to equip them with the ability to abstain from predicting concepts when the concept labeling component is uncertain. In other words, our model learns to provide rationales for its predictions, but only whenever it is sure the rationale is correct.
翻译:Koh等人 [2020年] 的“概念瓶颈模型”提供了一种手段,确保基于神经网络的分类师仅仅根据人类可理解的概念作出预测。概念标签或我们所指的理由,是通过CBM的概念标签组成部分学习的。另一个组成部分从这些预测概念标签中学会预测目标分类标签。不幸的是,这些模型严重依赖为每个数据点提供的人类提供的概念标签。如果这些标签不易获得,为了使建立信任措施能够采取强有力的行动,我们展示了如何在概念标签部分不确定时使其有能力不预测概念。换句话说,我们的模型学会为其预测提供理由,但只有在确信理由正确时才这样做。