We are concerned with debugging concept-based gray-box models (GBMs). These models acquire task-relevant concepts appearing in the inputs and then compute a prediction by aggregating the concept activations. This work stems from the observation that in GBMs both the concepts and the aggregation function can be affected by different bugs, and that correcting these bugs requires different kinds of corrective supervision. To this end, we introduce a simple schema for identifying and prioritizing bugs in both components, discuss possible implementations and open problems. At the same time, we introduce a new loss function for debugging the aggregation step that extends existing approaches to align the model's explanations to GBMs by making them robust to how the concepts change during training.
翻译:我们关心的是基于概念的调试灰盒模型(GBMs),这些模型获得投入中出现的任务相关概念,然后通过汇总概念激活来计算预测。 这项工作源于以下观察:在GBMs中,概念和汇总功能都可能受到不同错误的影响,纠正这些错误需要不同类型的纠正监督。 为此,我们引入了一个简单的模式,用于识别和优先排序两个组成部分中的错误,讨论可能的实施和未解决的问题。 与此同时,我们引入一个新的损失函数,用于调试汇总步骤,扩展现有方法,使模型的解释与GBMs相一致,使其在培训期间对概念的改变产生强大的影响。