We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs from both COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a cough suggestive of COVID-19. The datasets used in this study include subjects from all six continents and contain both forced and natural coughs, indicating that the approach is widely applicable. The publicly available Coswara dataset contains 92 COVID-19 positive and 1079 healthy subjects, while the second smaller dataset was collected mostly in South Africa and contains 18 COVID-19 positive and 26 COVID-19 negative subjects who have undergone a SARS-CoV laboratory test. Both datasets indicate that COVID-19 positive coughs are 15\%-20\% shorter than non-COVID coughs. Dataset skew was addressed by applying the synthetic minority oversampling technique (SMOTE). A leave-$p$-out cross-validation scheme was used to train and evaluate seven machine learning classifiers: LR, KNN, SVM, MLP, CNN, LSTM and Resnet50. Our results show that although all classifiers were able to identify COVID-19 coughs, the best performance was exhibited by the Resnet50 classifier, which was best able to discriminate between the COVID-19 positive and the healthy coughs with an area under the ROC curve (AUC) of 0.98. An LSTM classifier was best able to discriminate between the COVID-19 positive and COVID-19 negative coughs, with an AUC of 0.94 after selecting the best 13 features from a sequential forward selection (SFS). Since this type of cough audio classification is cost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.
翻译:我们展示了一台机器学习的COVID-19咳嗽分类器,该分类器可以区分COVID-1919阳性咳嗽,从COVID-1919阴性和健康咳嗽中,COVI-1919阴性和健康咳嗽都记录在智能手机上。这种筛选是非接触的,容易应用,可以减少测试中心的工作量,并通过向那些有COVID-19的咳嗽迹象的人推荐早期自我隔离来限制传播。本研究中使用的数据集包括所有六大洲的科目,包含强迫和自然咳,表明该方法广泛适用。公开提供的Coswara数据集包含92 COVID-1919正性和健康咳,而第二套较小的数据集则主要在南非收集,包含18 COVID-19正和26 COVID-19阴性内容。 这两套数据集都表明,CVD19阳性咳咳嗽比非COVID咳嗽更短。 使用合成的CS-19DOilential Scial Scial 和SLNeal Scial 测试中, 最短的S 13PDS 和最精准的CREA 。 和最高级的SLNILIL 测试是使用, 和最精度的CLVAL 和最精化的CM 。