We present an experimental investigation into the automatic detection of COVID-19 from smartphone recordings of coughs, breaths and speech. This type of screening is attractive because it is non-contact, does not require specialist medical expertise or laboratory facilities and can easily be deployed on inexpensive consumer hardware. We base our experiments on two datasets, Coswara and ComParE, containing recordings of coughing, breathing and speech from subjects around the globe. We have considered seven machine learning classifiers and all of them are trained and evaluated using leave-p-out cross-validation. For the Coswara data, the highest AUC of 0.92 was achieved using a Resnet50 architecture on breaths. For the ComParE data, the highest AUC of 0.93 was achieved using a k-nearest neighbours (KNN) classifier on cough recordings after selecting the best 12 features using sequential forward selection (SFS) and the highest AUC of 0.91 was also achieved on speech by a multilayer perceptron (MLP) when using SFS to select the best 23 features. We conclude that among all vocal audio, coughs carry the strongest COVID-19 signature followed by breath and speech. Although these signatures are not perceivable by human ear, machine learning based COVID-19 detection is possible from vocal audio recorded via smartphone.
翻译:我们对从咳嗽、呼吸和言语的智能录音中自动检测COVID-19进行实验性调查。这种筛查具有吸引力,因为它是非接触性的,不需要专家医疗专门知识或实验室设施,而且可以很容易地在廉价的消费硬件上部署。我们以两个数据集(科斯瓦拉和COMParE)作为实验基础,这些数据集包括全球各主体的咳嗽、呼吸和发言记录;我们考虑了7个机器学习分类器,所有这些分类器都经过了培训和评价,使用放假跨校校校验。关于科斯瓦拉的数据,使用Resnet50的呼吸结构实现了0.92个最高AUCE。对于ComParE的数据,使用Kearest邻居(KNNN)对咳嗽记录进行分类,在使用连续前选择(SFS)选择最佳的12个功能后,在使用多层感应器(MLP)选择最佳23个特征时,我们的结论是,所有声频为0.93的AUCUCSUC在所有的音频、可感应和可读的语音检测器上,这些CVI记录了最强的CIS的语音记录。