Recently, sound-based COVID-19 detection studies have shown great promise to achieve scalable and prompt digital pre-screening. However, there are still two unsolved issues hindering the practice. First, collected datasets for model training are often imbalanced, with a considerably smaller proportion of users tested positive, making it harder to learn representative and robust features. Second, deep learning models are generally overconfident in their predictions. Clinically, false predictions aggravate healthcare costs. Estimation of the uncertainty of screening would aid this. To handle these issues, we propose an ensemble framework where multiple deep learning models for sound-based COVID-19 detection are developed from different but balanced subsets from original data. As such, data are utilized more effectively compared to traditional up-sampling and down-sampling approaches: an AUC of 0.74 with a sensitivity of 0.68 and a specificity of 0.69 is achieved. Simultaneously, we estimate uncertainty from the disagreement across multiple models. It is shown that false predictions often yield higher uncertainty, enabling us to suggest the users with certainty higher than a threshold to repeat the audio test on their phones or to take clinical tests if digital diagnosis still fails. This study paves the way for a more robust sound-based COVID-19 automated screening system.
翻译:最近,基于声音的COVID-19检测研究显示,实现可扩展和迅速数字预检的极佳前景大有希望,然而,仍然存在两个阻碍这一做法的未解决的问题。首先,为模型培训收集的数据集往往不平衡,而为模型培训收集的数据集往往有相当小一部分的用户测试呈阳性,因此难以学习具有代表性和强健的特征。第二,深层次的学习模型在预测中通常过于自信。临床上,假预测会加剧保健费用。估计筛查的不确定性将有助于这一点。为了处理这些问题,我们提出了一个共同框架,在这个框架中,从原始数据的不同但平衡的子组中开发了多种基于声音的COVI-19检测的深层学习模型。由于这个框架,数据得到更有效的利用,与传统的上层和下层抽样方法相比:一个0.74的ACUC,其敏感性为0.68,其特性为0.69。同时,我们估计从多个模型之间的分歧中产生的不确定性会有所帮助。我们发现,假预测往往产生更高的不确定性,使我们能够建议用户在比门槛更高的程度上重复对手机进行稳健的自动测试,或者进行临床测试。