Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency. ProtoPNets compute predictions based on similarity to class-specific part-prototypes learned to recognize parts of training examples, making it easy to faithfully determine what examples are responsible for any target prediction and why. However, like other models, they are prone to picking up confounds and shortcuts from the data, thus suffering from compromised prediction accuracy and limited generalization. We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model's explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. An extensive empirical evaluation on synthetic and real-world data shows that ProtoPDebug outperforms state-of-the-art debuggers for a fraction of the annotation cost.
翻译:Part-prototype Networks (ProtoPNets) 是基于概念的分类器,目的是在不减损透明度的情况下实现与黑箱模型相同的性能。 ProtoPNets根据与特定类别部分培训实例相似的预测进行计算, 从而能够识别部分培训实例, 从而容易忠实地确定哪些实例应对任何目标预测及其原因负责。 但是, 与其他模型一样, 它们很容易从数据中提取混淆和捷径, 从而受到预测准确性受损和一般化有限的影响。 我们提议ProtoPDebug, ProtoPdebug, 一种有效的ProtoPenets概念级调试器, 在模型解释的指导下, 向人类监督员提供反馈, 其形式是必须遗忘或保存哪些部分原型, 模型经过微调, 以便与这种监督保持一致。 对合成和真实世界数据进行的广泛经验评估表明, ProtoPDEbuges 超越了最先进的调试器的状态, 以其中一部分通知成本为准。