Deep learning models are being applied to more and more use cases with astonishing success stories, but how do they perform in the real world? To test a model, a specific cleaned data set is assembled. However, when deployed in the real world, the model will face unexpected, out-of-distribution (OOD) data. In this work, we show that the so-called "radiologist-level" CheXnet model fails to recognize all OOD images and classifies them as having lung disease. To address this issue, we propose in-distribution voting, a novel method to classify out-of-distribution images for multi-label classification. Using independent class-wise in-distribution (ID) predictors trained on ID and OOD data we achieve, on average, 99 % ID classification specificity and 98 % sensitivity, improving the end-to-end performance significantly compared to previous works on the chest X-ray 14 data set. Our method surpasses other output-based OOD detectors even when trained solely with ImageNet as OOD data and tested with X-ray OOD images.
翻译:深度学习模型正在应用于越来越多的使用案例,这些案例的成功事例惊人,但是它们是如何在现实世界中表现的?为了测试一个模型,将收集一个具体的清洁数据集。然而,当在现实世界中部署时,模型将面临出乎意料的、分配之外的(OOOD)数据。在这项工作中,我们表明所谓的“放射学家级”CheXnet模型未能承认所有OOD图像并将其归类为肺病。为了解决这个问题,我们建议进行分配投票,这是将分发外图像分类用于多标签分类的新方法。我们使用独立分类的分类分配(ID)预测器,在ID和OOD数据方面受过培训,平均达到99%的ID分类特性和98%的灵敏度,与先前的胸部X光14数据集工程相比,我们的方法大大改进了端对端的性表现。我们的方法超过了其他基于产出的OD探测器,即使我们仅接受作为OD数据进行图像网络的培训,并用X光 OOD图像进行了测试。