Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an open-ended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task. Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine Learning, Natural Language Processing
翻译:低中收入国家因缺乏死亡原因数据而面临挑战,这可能会限制对人口健康和疾病管理的决定; 口头尸检(VA)可以在没有健全的死亡登记制度的地区提供关于死亡原因的资料; 甲甲甲由结构化数据构成,结合数字和二进制特征,以及作为开放式叙述性案文一部分的无结构化数据组成; 本研究报告在分析VA报告的结构化和非结构化组成部分时评估了各种机器学习方法的绩效; 算法通过二进制特征、文本特征以及南非农村VA报告产生的二进制和文本特征组合的交叉校验,经过培训和测试; 获得的结果表明,叙述性文字特征包含确定死亡原因的宝贵信息,而二进制特征和文本组合可以改进COD自动化分类任务。 关键词:糖尿病Mellitus、Verbal Autis、死亡原因、机器学习、自然语言处理。