现实世界环境中结核病筛查的自动咳嗽分类 (Automatic Cough Classification for Tuberculosis Screening in a Real-World Environment)

We present first results showing that it is possible to automatically discriminate between the coughing sounds produced by patients with tuberculosis (TB) and those produced by patients with other lung ailments in a real-world noisy environment. Our experiments are based on a dataset of cough recordings obtained in a real-world clinic setting from 16 patients confirmed to be suffering from TB and 33 patients that are suffering from respiratory conditions, confirmed as other than TB. We have trained and evaluated several machine learning classifiers, including logistic regression (LR), support vector machines (SVM), k-nearest neighbour (KNN), multilayer perceptrons (MLP) and convolutional neural networks (CNN) inside a nested k-fold cross-validation and find that, although classification is possible in all cases, the best performance is achieved using the LR classifier. In combination with feature selection by sequential forward search (SFS), our best LR system achieves an area under the ROC curve (AUC) of 0.94 using 23 features selected from a set of 78 high-resolution mel-frequency cepstral coefficients (MFCCs). This system achieves a sensitivity of 93% at a specificity of 95% and thus exceeds the 90\% sensitivity at 70% specificity specification considered by the WHO as minimal requirements for community-based TB triage test. We conclude that automatic classification of cough audio sounds is promising as a viable means of low-cost easily-deployable front-line screening for TB, which will greatly benefit developing countries with a heavy TB burden.

翻译：我们提出初步结果,表明有可能在现实世界的吵闹环境中自动区分肺结核患者和其他肺病患者产生的咳嗽声;我们的实验基于在现实世界诊所中从16名经证实患有肺结核的病人和33名呼吸疾病患者中获得的咳嗽记录数据集,这些数据来自16名经证实患有肺结核的病人和33名经证实为肺结核以外的其他呼吸条件的病人;我们训练和评价了若干机器学习分类,包括后勤回归(LR)、支持病媒机器(SVM)、K-最接近的邻居(KNN)、多层感应器(MLP)和突变神经网络(CNN),在嵌套K-倍交叉校验时,发现尽管在所有情况下都有可能进行分类,但最佳的性能是使用LRG分类器。结合连续前搜索(SFS)的特征选择,我们最好的LF系统在ROC曲线下达到一个0.94的区域,使用一套从78个高分辨率的中选择高分辨率中选择的MLF-直径(MCC)和神经网络网络网络网络网络(CNNG-MCC)中选择一套高分辨率的23功能,因此,以93%的高度的精确的精确度敏感度将达到95的精确度标准的精确度的精确度的精确度的精确度要求。

相关内容

特征选择

关注 5934

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

专知会员服务

51+阅读 · 2020年8月25日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日