Objective: The automatic discrimination between the coughing sounds produced by patients with tuberculosis (TB) and those produced by patients with other lung ailments. Approach: We present experiments based on a dataset of 1358 forced cough recordings obtained in a developing-world clinic from 16 patients with confirmed active pulmonary TB and 35 patients suffering from respiratory conditions suggestive of TB but confirmed to be TB negative. Using nested cross-validation, we have trained and evaluated five machine learning classifiers: logistic regression (LR), support vector machines (SVM), k-nearest neighbour (KNN), multilayer perceptrons (MLP) and convolutional neural networks (CNN). Main Results: Although classification is possible in all cases, the best performance is achieved using LR. In combination with feature selection by sequential forward selection (SFS), our best LR system achieves an area under the ROC curve (AUC) of 0.94 using 23 features selected from a set of 78 high-resolution mel-frequency cepstral coefficients (MFCCs). This system achieves a sensitivity of 93\% at a specificity of 95\% and thus exceeds the 90\% sensitivity at 70\% specificity specification considered by the World Health Organisation (WHO) as a minimal requirement for a community-based TB triage test. Significance: The automatic classification of cough audio sounds, when applied to symptomatic patients requiring investigation for TB, can meet the WHO triage specifications for the identification of patients who should undergo expensive molecular downstream testing. This makes it a promising and viable means of low cost, easily deployable frontline screening for TB, which can benefit especially developing countries with a heavy TB burden.
翻译:目标:对结核病患者和其他肺病患者产生的咳嗽声进行自动区分。方法:我们根据发展中国家诊所从16名确诊活跃肺部结核病患者和35名呼吸状况患者获得的1358份强制咳嗽记录数据集,从16名确诊活跃肺部结核病患者和35名呼吸状况患者获得的1358份强制咳嗽记录,显示肺结核为负值。我们利用嵌套交叉校验,培训和评价了5个机器学习分类:后勤回归(LR)、支持传感机(SVM)、K-最接近邻居(KNNN)、多层透视器(MLP)和动态神经网络(CNN)。主要结果:尽管分类是可能的,但最佳性能是通过连续前期选择(SFS)取得的特征选择。我们最好的LRR系统在ROC曲线(AUC)下实现了0.94区域,使用了从一套78种高分辨率流中选定的23个特征,支持高分辨率卡路里氏血压测试(MFCCs)、低度直径透视值患者(MBCs),这个系统在95°和90级测试中达到最低度的精确度测试要求,从而测量组织(Cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx