对缺失数据进行有力深入学习:发现COVID-19的新办法 (Deep Learning with robustness to missing data: A novel approach to the detection of COVID-19)

In the context of the current global pandemic and the limitations of the RT-PCR test, we propose a novel deep learning architecture, DFCN, (Denoising Fully Connected Network) for the detection of COVID-19 using laboratory tests and chest x-rays. Since medical facilities around the world differ enormously in what laboratory tests or chest imaging may be available, DFCN is designed to be robust to missing input data. An ablation study extensively evaluates the performance benefits of the DFCN architecture as well as its robustness to missing inputs. Data from 1088 patients with confirmed RT-PCR results are obtained from two independent medical facilities. The data collected includes results from 27 laboratory tests and a chest x-ray scored by a deep learning network. Training and test datasets are defined based on the source medical facility. Data is made publicly available. The performance of DFCN in predicting the RT-PCR result is compared with 3 related architectures as well as a Random Forest baseline. All models are trained with varying levels of masked input data to encourage robustness to missing inputs. Missing data is simulated at test time by masking inputs randomly. Using area under the receiver operating curve (AUC) as a metric, DFCN outperforms all other models with statistical significance using random subsets of input data with 2-27 available inputs. When all 28 inputs are available DFCN obtains an AUC of 0.924, higher than achieved by any other model. Furthermore, with clinically meaningful subsets of parameters consisting of just 6 and 7 inputs respectively, DFCN also achieves higher AUCs than any other model, with values of 0.909 and 0.919.

翻译：在当前全球大流行病和RT-PCR测试的局限性的背景下,我们提出一个新的深层次学习结构,即DFCN, (DFCN),用于利用实验室测试和胸前X光检查COVID-19。由于世界各地的医疗设施在实验室测试或胸前成像方面差异巨大,DFCN旨在对缺失输入数据进行强力评估。一项通缩研究广泛评价DFCN结构的性能效益及其对缺失输入的稳健性。从两个独立的医疗设施获取了1088名获得经确认的RT-PCR结果的病人的数据。所收集的数据包括27个实验室测试的结果和一个由深层学习网络取得的胸前X光。培训和测试数据集以来源医疗设施为基础加以界定。DFCN在预测RT-PCR结果方面的表现与3个相关模型以及随机森林基线进行了比较。所有模型都经过不同层次的掩码输入数据数据培训,鼓励将TR-PCRC的值提高到缺失的值。在测试时,以测试时将CN 6CN数据模拟CN X光值xxXXX的胸透视结果,同时使用其他数据库的正标版数据,使用AUAU的正版数据。