FAIR4Cov: 爆裂音频实例和COVID-19探测代表 (FAIR4Cov: Fused Audio Instance and Representation for COVID-19 Detection)

Audio-based classification techniques on body sounds have long been studied to support diagnostic decisions, particularly in pulmonary diseases. In response to the urgency of the COVID-19 pandemic, a growing number of models are developed to identify COVID-19 patients based on acoustic input. Most models focus on cough because the dry cough is the best-known symptom of COVID-19. However, other body sounds, such as breath and speech, have also been revealed to correlate with COVID-19 as well. In this work, rather than relying on a specific body sound, we propose Fused Audio Instance and Representation for COVID-19 Detection (FAIR4Cov). It relies on constructing a joint feature vector obtained from a plurality of body sounds in waveform and spectrogram representation. The core component of FAIR4Cov is a self-attention fusion unit that is trained to establish the relation of multiple body sounds and audio representations and integrate it into a compact feature vector. We set up our experiments on different combinations of body sounds using only waveform, spectrogram, and a joint representation of waveform and spectrogram. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. This AUC is 0.0227 higher than the one of the models trained on spectrograms only and 0.0847 higher than the one of the models trained on waveforms only. The results demonstrate that the combination of spectrogram with waveform representation helps to enrich the extracted features and outperforms the models with single representation.

翻译：长期以来,一直研究身体声音的音频分类技术,以支持诊断决定,特别是肺病方面的诊断性决定。针对COVID-19流行病的紧迫性,我们开发了越来越多的模型,以根据声学投入确定COVID-19病人。大多数模型侧重于咳嗽,因为干咳是COVID-19最著名的症状。然而,其他身体声音,例如呼吸和言语,也显示与COVID-19有联系。在这项工作中,我们建议使用声音和表示COVID-19检测(FAIR4Cov),这依靠在波形和光谱代表中从身体的多元声音中获取的联合功能矢量。大多数模型侧重于咳嗽,因为干咳是COVID-19-19最著名的症状。但是,其他身体声音,例如呼吸和言语等,也暴露了与COVID-19的关联性联系。我们用不同的身体声音组合进行了实验,只有波形、光谱和COVID-199 探测(FAIR4Cov) 。我们的调查结果显示从波状和光谱的共代表了从波状的多面上,一个经过训练的SALSBSy-CSy-CSy、一个Sy-CSy 和SIRBorlormax 演示到一个Syal-x 。我们使用了一种Syal-h-h-hal-h-h-h-hal-h-h-h-h-h-hal-hor-hal-hormal-hormal-hormal-h-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-l-l-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-l-l-ld-l-l-ld-ld-ld-ld-ld-ld-l-l-l-ld-ld-ld-l-l-l-ld-l-l-l-ld-l-l-ld-ld-ld-ld-l-ld-ld-ld-l-