A growing body of work studies Blindspot Discovery Methods ("BDM"s): methods that use an image embedding to find semantically meaningful (i.e., united by a human-understandable concept) subsets of the data where an image classifier performs significantly worse. Motivated by observed gaps in prior work, we introduce a new framework for evaluating BDMs, SpotCheck, that uses synthetic image datasets to train models with known blindspots and a new BDM, PlaneSpot, that uses a 2D image representation. We use SpotCheck to run controlled experiments that identify factors that influence BDM performance (e.g., the number of blindspots in a model, or features used to define the blindspot) and show that PlaneSpot is competitive with and in many cases outperforms existing BDMs. Importantly, we validate these findings by designing additional experiments that use real image data from MS-COCO, a large image benchmark dataset. Our findings suggest several promising directions for future work on BDM design and evaluation. Overall, we hope that the methodology and analyses presented in this work will help facilitate a more rigorous science of blindspot discovery.
翻译:随着研究不断增加,越来越多的工作关注盲区探测方法(“BDMs”):利用图像嵌入找到语义上有意义的数据子集(即,由人可理解的概念统一),在此子集上图像分类器的表现明显较差。受到先前工作的观察到的差距的启发,我们引入了一种新的框架来评估BDMs,即SpotCheck,它使用合成图像数据集来训练具有已知盲区的模型以及一种新的BDM,即PlaneSpot,它使用2D图像表示。我们使用SpotCheck来进行对照实验,以确定影响BDM性能的因素(如模型中的盲区数量或用于定义盲区的特征),并表明PlaneSpot与现有的BDM竞争性相当,在许多情况下甚至更优。重要的是,我们通过设计使用MS-COCO实际图像数据的其他实验来验证这些发现,这是一个大型图像基准数据集。我们的发现为BDM设计和评估的未来工作指出了几个有前途的方向。总体而言,我们希望这项工作所展示的方法和分析将有助于促进更严格的盲区探测科学研究。