Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency. ProtoPNets compute predictions based on similarity to class-specific part-prototypes learned to recognize parts of training examples, making it easy to faithfully determine what examples are responsible for any target prediction and why. However, like other models, they are prone to picking up confounders and shortcuts from the data, thus suffering from compromised prediction accuracy and limited generalization. We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model's explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. Our experimental evaluation shows that ProtoPDebug outperforms state-of-the-art debuggers for a fraction of the annotation cost. An online experiment with laypeople confirms the simplicity of the feedback requested to the users and the effectiveness of the collected feedback for learning confounder-free part-prototypes. ProtoPDebug is a promising tool for trustworthy interactive learning in critical applications, as suggested by a preliminary evaluation on a medical decision making task.
翻译:Part-prototype Networks (ProtoPNets) 是基于概念的分类器,目的是在不降低透明度的情况下实现与黑箱模型相同的性能。 ProtoPNets 计算预测时, 所依据的是类似于班级特定部分原型的预测, 学会了识别部分培训范例, 容易忠实地确定哪些例子对任何目标预测和原因负责。 但是, 与其他模型一样, 它们容易从数据中挑拣选混淆者和捷径, 从而受到预测准确性和有限概括性的影响。 我们提议了ProtoPDebugg, 这是一种有效的普罗托PNets概念级调试器, 由人类监督员在模型的解释指导下, 以必须忘记或保留哪些部分原型来提供反馈, 并且该模型经过微调与这种监督保持一致。 我们的实验性评估显示, ProtoPDebuilt 将最先进的调试器排出部分的成本。 我们的在线实验证实向用户索取的反馈的简单性, 以及收集到的反馈的有效性, 用于学习无风险的交互式工具, 初步学习关键评估。