Purpose: we evaluated the generalization capability of deep neural networks (DNNs), trained to classify chest X-rays as Covid-19, normal or pneumonia, using a relatively small and mixed dataset. Methods: we proposed a DNN to perform lung segmentation and classification, stacking a segmentation module (U-Net), an original intermediate module and a classification module (DenseNet201). To evaluate generalization, we tested the DNN with an external dataset (from distinct localities) and used Bayesian inference to estimate probability distributions of performance metrics. Results: our DNN achieved 0.917 AUC on the external test dataset, and a DenseNet without segmentation, 0.906. Bayesian inference indicated mean accuracy of 76.1% and [0.695, 0.826] 95% HDI (highest density interval, which concentrates 95% of the metric's probability mass) with segmentation and, without segmentation, 71.7% and [0.646, 0.786]. Conclusion: employing a novel DNN evaluation technique, which uses LRP and Brixia scores, we discovered that areas where radiologists found strong Covid-19 symptoms are the most important for the stacked DNN classification. External validation showed smaller accuracies than internal, indicating difficulty in generalization, which is positively affected by segmentation. Finally, the performance in the external dataset and the analysis with LRP suggest that DNNs can be trained in small and mixed datasets and still successfully detect Covid-19.
翻译:目标:我们评估了深神经神经网络(DNN)的总体能力,用相对小和混合的数据集将胸X光分为Covid-19、正常或肺炎。方法:我们建议DNN进行肺分解和分类,堆叠一个分解模块(U-Net)、一个原始中间模块和一个分类模块(DenseNet201),我们用(不同地点的)外部数据集测试DNN,并使用巴耶斯推断来估计性能指标的概率分布情况。结果:我们的DNN在外部测试数据集和无分解的DesenseNet上实现了0.917 AUC, 0.906。巴伊斯的推断表明,76.1%和[0.695,0.826] 95%的人类发展指数(最高密度间隔,集中了95%的衡量概率质量)的平均值,我们用一个(不同地点的)外部数据集进行了测试,71.7%和[0.646,0.786]。结论:使用新的DNNNE评价技术,仍然使用LRP和Brixia分级的分数,我们发现,在Reeseral-19分数显示,最后的外部数据显示,在LNRC分级分析中,最难的中间分级分析中,其内部数据分析显示较难的区域数据。