现实世界环境中结核病筛查的自动咳嗽分类 (Automatic Cough Classification for Tuberculosis Screening in a Real-World Environment)

Objective: The automatic discrimination between the coughing sounds produced by patients with tuberculosis (TB) and those produced by patients with other lung ailments. Approach: We present experiments based on a dataset of 1358 forced cough recordings obtained in a developing-world clinic from 16 patients with confirmed active pulmonary TB and 35 patients suffering from respiratory conditions suggestive of TB but confirmed to be TB negative. Using nested cross-validation, we have trained and evaluated five machine learning classifiers: logistic regression (LR), support vector machines (SVM), k-nearest neighbour (KNN), multilayer perceptrons (MLP) and convolutional neural networks (CNN). Main Results: Although classification is possible in all cases, the best performance is achieved using LR. In combination with feature selection by sequential forward selection (SFS), our best LR system achieves an area under the ROC curve (AUC) of 0.94 using 23 features selected from a set of 78 high-resolution mel-frequency cepstral coefficients (MFCCs). This system achieves a sensitivity of 93\% at a specificity of 95\% and thus exceeds the 90\% sensitivity at 70\% specificity specification considered by the World Health Organisation (WHO) as a minimal requirement for a community-based TB triage test. Significance: The automatic classification of cough audio sounds, when applied to symptomatic patients requiring investigation for TB, can meet the WHO triage specifications for the identification of patients who should undergo expensive molecular downstream testing. This makes it a promising and viable means of low cost, easily deployable frontline screening for TB, which can benefit especially developing countries with a heavy TB burden.

翻译：目标:对结核病患者和其他肺病患者产生的咳嗽声进行自动区分。方法:我们根据发展中国家诊所从16名确诊活跃肺部结核病患者和35名呼吸状况患者获得的1358份强制咳嗽记录数据集,从16名确诊活跃肺部结核病患者和35名呼吸状况患者获得的1358份强制咳嗽记录,显示肺结核为负值。我们利用嵌套交叉校验,培训和评价了5个机器学习分类:后勤回归(LR)、支持传感机(SVM)、K-最接近邻居(KNNN)、多层透视器(MLP)和动态神经网络(CNN)。主要结果:尽管分类是可能的,但最佳性能是通过连续前期选择(SFS)取得的特征选择。我们最好的LRR系统在ROC曲线(AUC)下实现了0.94区域,使用了从一套78种高分辨率流中选定的23个特征,支持高分辨率卡路里氏血压测试(MFCCs)、低度直径透视值患者(MBCs),这个系统在95°和90级测试中达到最低度的精确度测试要求,从而测量组织(Cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

专知会员服务

40+阅读 · 2020年7月14日