We present a deep learning based automatic cough classifier which can discriminate tuberculosis (TB) coughs from COVID-19 coughs and healthy coughs. Both TB and COVID-19 are respiratory disease, have cough as a predominant symptom and claim thousands of lives each year. The cough audio recordings were collected at both indoor and outdoor settings and also uploaded using smartphones from subjects around the globe, thus contain various levels of noise. This cough data include 1.68 hours of TB coughs, 18.54 minutes of COVID-19 coughs and 1.69 hours of healthy coughs from 47 TB patients, 229 COVID-19 patients and 1498 healthy patients and were used to train and evaluate a CNN, LSTM and Resnet50. These three deep architectures were also pre-trained on 2.14 hours of sneeze, 2.91 hours of speech and 2.79 hours of noise for improved performance. The class-imbalance in our dataset was addressed by using SMOTE data balancing technique and using performance metrics such as F1-score and AUC. Our study shows that the highest F1-scores of 0.9259 and 0.8631 have been achieved from a pre-trained Resnet50 for two-class (TB vs COVID-19) and three-class (TB vs COVID-19 vs healthy) cough classification tasks, respectively. The application of deep transfer learning has improved the classifiers' performance and makes them more robust as they generalise better over the cross-validation folds. Their performances exceed the TB triage test requirements set by the world health organisation (WHO). The features producing the best performance contain higher order of MFCCs suggesting that the differences between TB and COVID-19 coughs are not perceivable by the human ear. This type of cough audio classification is non-contact, cost-effective and can easily be deployed on a smartphone, thus it can be an excellent tool for both TB and COVID-19 screening.
翻译:我们展示了一个基于深层次学习的基于自动咳嗽分类器,它会歧视来自COVID-19咳嗽和健康咳嗽的肺结核(TB),肺结核和COVID-19是呼吸道疾病,咳嗽是主要症状,每年夺走数千人的生命;咳嗽录音是在室内和室外环境中收集的,还用全球各主题的智能手机上传的,因此含有不同程度的噪音;这种咳嗽数据包括1.68小时的肺结核咳嗽、18.54分钟的COVID-19咳嗽和1.69小时的健康咳嗽,来自47个肺结核病人、229 COVI-19病人和1498个健康病人,用来训练和评价CNN、LSTM和Resnet50。这三种深层次的结构也是在2.14小时的喷雾、2.91小时的语音录音录音录音记录和2.79小时里预先训练的,因此我们的数据集的阶级平衡可以通过SMOTE数据平衡技术以及F1-S-19级和AUCUC等性能衡量工具,我们的研究显示,最高的F1级和14-D-191级的心脏和0.861的心脏测试要求之间的最高值, 也比O-D-D-VI的功能更高的性能的性能 。