The current generation of deep neural networks has achieved close-to-human results on "closed-set" image recognition; that is, the classes being evaluated overlap with the training classes. Many recent methods attempt to address the importance of the unknown, which are termed "open-set" recognition algorithms, try to reject unknown classes as well as maintain high recognition accuracy on known classes. However, it is still unclear how different general domain-trained open-set methods from ImageNet would perform on a different but more specific domain, such as the medical domain. Without principled and formal evaluations to measure the effectiveness of those general open-set methods, artificial intelligence (AI)-based medical diagnostics would experience ineffective adoption and increased risks of bad decision making. In this paper, we conduct rigorous evaluations amongst state-of-the-art open-set methods, exploring different open-set scenarios from "similar-domain" to "different-domain" scenarios and comparing them on various general and medical domain datasets. We summarise the results and core ideas and explain how the models react to various degrees of openness and different distributions of open classes. We show the main difference between general domain-trained and medical domain-trained open-set models with our quantitative and qualitative analysis of the results. We also identify aspects of model robustness in real clinical workflow usage according to confidence calibration and the inference efficiency.
翻译:目前这一代深层神经网络在“封闭式”图像识别方面实现了接近人类的结果;也就是说,正在评估的课程与培训课程重叠。许多最近的方法试图解决未知(称为“开放式”识别算法)的重要性,试图拒绝未知的类,并保持已知类的高度识别准确性。然而,目前尚不清楚的是,图像网中不同的一般域训练的开放型方法在医疗领域等不同但更为具体的领域将如何运行。如果不进行有原则和正式的评价,以衡量这些一般开放型方法的有效性,人工智能(AI)的医学诊断将遭遇无效的采用和不良决策风险的增加。在本文件中,我们对最先进的开放型方法进行严格的评估,探索从“相似”到“不同”情景的不同开放型情景,并在各种一般和医疗领域数据集中进行比较。我们总结了结果和核心想法,并解释了模型如何对开放型课程的不同程度和不同分布做出反应。我们在一般领域和临床分析中,也展示了我们所了解的开放性领域和定量分析结果的准确性。我们从一般领域和定量分析中找出了我们所了解的准确性方向。