This paper evaluates a wide range of audio-based deep learning frameworks applied to the breathing, cough, and speech sounds for detecting COVID-19. In general, the audio recording inputs are transformed into low-level spectrogram features, then they are fed into pre-trained deep learning models to extract high-level embedding features. Next, the dimension of these high-level embedding features are reduced before finetuning using Light Gradient Boosting Machine (LightGBM) as a back-end classification. Our experiments on the Second DiCOVA Challenge achieved the highest Area Under the Curve (AUC), F1 score, sensitivity score, and specificity score of 89.03%, 64.41%, 63.33%, and 95.13%, respectively. Based on these scores, our method outperforms the state-of-the-art systems, and improves the challenge baseline by 4.33%, 6.00% and 8.33% in terms of AUC, F1 score and sensitivity score, respectively.
翻译:本文评估了用于探测COVID-19的呼吸、咳嗽和语音声音的多种基于声音的深层次深层次学习框架。 一般来说, 录音输入被转化成低水平光谱功能, 然后被输入经过培训的深层次学习模型中以提取高水平嵌入功能。 其次, 这些高层次嵌入功能的维度在微调前被缩小,然后使用光速推介机(LightGBM)作为后端分类。 我们在第二次DiCOVA挑战上的实验分别达到Curve(AUC)、F1评分、灵敏度评分和特殊度评分89.03%、64.41%、63.33%和95.13%的最高区域。 根据这些评分,我们的方法优于最新技术系统,并在AUC、F1评分和灵敏度评分方面分别提高了4.33%、6.0%和8.33%的挑战基线。