愿景核对清单:努力对图像模型进行可测试的错误分析,以帮助系统设计者交流模型能力 (Vision Checklist: Towards Testable Error Analysis of Image Models to Help System Designers Interrogate Model Capabilities)

Using large pre-trained models for image recognition tasks is becoming increasingly common owing to the well acknowledged success of recent models like vision transformers and other CNN-based models like VGG and Resnet. The high accuracy of these models on benchmark tasks has translated into their practical use across many domains including safety-critical applications like autonomous driving and medical diagnostics. Despite their widespread use, image models have been shown to be fragile to changes in the operating environment, bringing their robustness into question. There is an urgent need for methods that systematically characterise and quantify the capabilities of these models to help designers understand and provide guarantees about their safety and robustness. In this paper, we propose Vision Checklist, a framework aimed at interrogating the capabilities of a model in order to produce a report that can be used by a system designer for robustness evaluations. This framework proposes a set of perturbation operations that can be applied on the underlying data to generate test samples of different types. The perturbations reflect potential changes in operating environments, and interrogate various properties ranging from the strictly quantitative to more qualitative. Our framework is evaluated on multiple datasets like Tinyimagenet, CIFAR10, CIFAR100 and Camelyon17 and for models like ViT and Resnet. Our Vision Checklist proposes a specific set of evaluations that can be integrated into the previously proposed concept of a model card. Robustness evaluations like our checklist will be crucial in future safety evaluations of visual perception modules, and be useful for a wide range of stakeholders including designers, deployers, and regulators involved in the certification of these systems. Source code of Vision Checklist would be open for public use.

翻译：在图像识别任务方面,使用经过预先培训的大型模型越来越普遍,因为人们广泛承认最近一些模型的成功,如视觉变压器和VGG和Resnet等其他CNN的模型。这些基准任务模型的高度精确性已转化为在许多领域的实际应用,包括安全关键应用软件,如自主驾驶和医学诊断。尽管这些模型广泛使用,但事实证明它们易受操作环境变化的影响,使其稳健性受到质疑。迫切需要采用系统化地定性和量化这些模型的能力的方法,以帮助设计者理解和保证其安全和稳健性。在本文件中,我们提出了愿景核对列表,目的是检验模型的能力,以便产生一份报告,供系统设计者用来进行稳健性评估。这个框架提出一套扰动性操作操作操作操作环境的变化,反映操作环境的潜在变化,并调查从严格定量到更定性的各种特性。我们的框架是多个数据集,如Tinyimagenet、CIFAR10、CIFAR和RE17等关键性评估。这个框架将用来在我们的视觉和SVILLA中提出一个核心评估。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/