COVID-19 利用深传学习和瓶颈地物在咳咳、呼吸和讲话中探测 (COVID-19 Detection in Cough, Breath and Speech using Deep Transfer Learning and Bottleneck Features)

We present an experimental investigation into the effectiveness of transfer learning and bottleneck feature extraction in detecting COVID-19 from audio recordings of cough, breath and speech. This type of screening is non-contact, does not require specialist medical expertise or laboratory facilities and can be deployed on inexpensive consumer hardware. We use datasets that contain recordings of coughing, sneezing, speech and other noises, but do not contain COVID-19 labels, to pre-train three deep neural networks: a CNN, an LSTM and a Resnet50. These pre-trained networks are subsequently either fine-tuned using smaller datasets of coughing with COVID-19 labels in the process of transfer learning, or are used as bottleneck feature extractors. Results show that a Resnet50 classifier trained by this transfer learning process delivers optimal or near-optimal performance across all datasets achieving areas under the receiver operating characteristic (ROC AUC) of 0.98, 0.94 and 0.92 respectively for all three sound classes (coughs, breaths and speech). This indicates that coughs carry the strongest COVID-19 signature, followed by breath and speech. Our results also show that applying transfer learning and extracting bottleneck features using the larger datasets without COVID-19 labels led not only to improve performance, but also to minimise the standard deviation of the classifier AUCs among the outer folds of the leave-$p$-out cross-validation, indicating better generalisation. We conclude that deep transfer learning and bottleneck feature extraction can improve COVID-19 cough, breath and speech audio classification, yielding automatic classifiers with higher accuracy.

翻译：在检测咳嗽、呼吸和言语录音的COVID-19时,我们实验性地调查转移学习和瓶颈特征提取的效果。这种筛选是非接触性的,不需要专家医疗专门知识或实验室设施,可以安装在廉价的消费硬件上。我们使用包含咳嗽、打喷嚏、言语和其他噪音记录的数据集,但不包含COVID-19标签,对三个深度神经网络进行预培训:CNN、LSTM和Resnet50。这些预先培训的网络随后或者使用在传输学习过程中使用COVI-19标签咳嗽的较小数据集进行微调,或者用作瓶装自动特征提取器。结果显示,通过这种传输学习过程培训的Resnet50分类能够提供咳嗽、喷鼻、喷鼻、言和其他噪音的录音记录,但在所有接收器操作特征(ROCAUC)下,所有三个声音类别(咳嗽、呼吸和言语)中,使用最强的COVI-19级声音记录进行精细的读数。这说明,CVI-19级记录中最强的语音缩缩缩缩缩缩的缩缩缩缩缩缩缩的缩缩缩缩缩缩的缩图,我们通过呼吸和升级数据也显示,我们通过呼吸和升级和升级的升级的成绩和升级的升级数据显示。