Current approaches for designing self-explainable models (SEMs) require complicated training procedures and specific architectures which makes them impractical. With the advance of general purpose foundation models based on Vision Transformers (ViTs), this impracticability becomes even more problematic. Therefore, new methods are necessary to provide transparency and reliability to ViT-based foundation models. In this work, we present a new method for turning any well-trained ViT-based model into a SEM without retraining, which we call Keypoint Counting Classifiers (KCCs). Recent works have shown that ViTs can automatically identify matching keypoints between images with high precision, and we build on these results to create an easily interpretable decision process that is inherently visualizable in the input. We perform an extensive evaluation which show that KCCs improve the human-machine communication compared to recent baselines. We believe that KCCs constitute an important step towards making ViT-based foundation models more transparent and reliable.


翻译:当前设计自解释模型的方法需要复杂的训练流程和特定架构,这使得它们在实际应用中难以推广。随着基于视觉Transformer的通用基础模型的发展,这种不实用性变得更为突出。因此,有必要开发新方法为基于ViT的基础模型提供透明度和可靠性。本文提出一种新方法,可将任何训练良好的基于ViT的模型转化为自解释模型而无需重新训练,我们称之为关键点计数分类器。近期研究表明,ViT能够以高精度自动识别图像间的匹配关键点,我们基于这些成果构建了一种易于解释的决策过程,该过程可在输入空间中实现可视化。我们进行了广泛评估,结果表明与现有基线方法相比,KCCs显著改善了人机交互的可理解性。我们相信KCCs是推动基于ViT的基础模型向更高透明度和可靠性发展的重要一步。

0
下载
关闭预览

相关内容

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型(即我们通常所说的分类器(Classifier))。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个,从而可以应用于数据预测。总之,分类器是数据挖掘中对样本进行分类的方法的统称,包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。
Top
微信扫码咨询专知VIP会员