The usage of smartphone-collected respiratory sound, trained with deep learning models, for detecting and classifying COVID-19 becomes popular recently. It removes the need for in-person testing procedures especially for rural regions where related medical supplies, experienced workers, and equipment are limited. However, existing sound-based diagnostic approaches are trained in a fully supervised manner, which requires large scale well-labelled data. It is critical to discover new methods to leverage unlabelled respiratory data, which can be obtained more easily. In this paper, we propose a novel self-supervised learning enabled framework for COVID-19 cough classification. A contrastive pre-training phase is introduced to train a Transformer-based feature encoder with unlabelled data. Specifically, we design a random masking mechanism to learn robust representations of respiratory sounds. The pre-trained feature encoder is then fine-tuned in the downstream phase to perform cough classification. In addition, different ensembles with varied random masking rates are also explored in the downstream phase. Through extensive evaluations, we demonstrate that the proposed contrastive pre-training, the random masking mechanism, and the ensemble architecture contribute to improving cough classification performance.
翻译:使用智能手机收集的呼吸道声音,经过深层学习模式的培训,用于检测和分类COVID-19,最近变得很受欢迎。它消除了对现场检测程序的需求,特别是在农村地区,因为那里的医疗用品、经验丰富的工人和设备有限。然而,现有的基于健康的诊断方法受到充分监督的培训,这需要大规模的标签良好的数据。重要的是要发现利用无标签呼吸道数据的新方法,这种方法可以更容易地获得。在本文中,我们提议为COVID-19咳嗽分类建立一个全新的自我监督的自监督学习启用框架。引入了一个对比式的训练前阶段,用未贴标签的数据来培训一个基于变异器的特征编码器。具体地说,我们设计了一个随机遮罩机制,以学习呼吸道声音的稳健的表达方式。经过预先训练的特征编码器随后在下游阶段进行微调,以进行咳嗽分类。此外,在下游阶段还探索了不同随机遮盖率的不同品种。通过广泛的评估,我们展示了拟议的对比性培训前、随机遮盖装置和感官结构有助于改进咳性分类。