项目名称: 基于概率校准和集成学习的出生缺陷发病风险预测模型研究
项目编号: No.81502897
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 医药、卫生
项目作者: 罗艳虹
作者单位: 山西医科大学
项目金额: 18万元
中文摘要: 出生缺陷已成为影响人口素质和群体健康水平的重要公共卫生问题,准确预测出生缺陷发病风险对预防出生缺陷意义重大。目前,我国的出生缺陷监测方法、诊断技术及分析手段等环节日益完善,但仍存在漏诊、误诊及漏报等不足,使出生缺陷预测概率存在偏差,同时出生缺陷队列数据存在类别不平衡问题,致使构建的风险预测模型其预测性能降低。本项目定位于出生缺陷发病风险预测模型研究,提出利用概率校准技术和效果优良的机器学习算法,构建基于Platt scaling概率校准的随机森林和支持向量机模型,并结合传统logistic模型对预测概率进行集成学习,提高预测性能。相关概率校准和集成学习技术在数值模拟和UCI机器学习公共数据集验证的基础上,对山西省出生队列数据进行实证分析,从而有效筛选出生缺陷高危人群,对风险进行预警和控制。本项目可为制定出生缺陷干预策略提供理论依据,对预防出生缺陷、提高人口素质具有重要意义。
中文关键词: 出生缺陷;概率校准;集成学习;风险预测
英文摘要: Birth defect has become an important public health issue influencing quality of population and population health. To predict birth defects risk accurately is of great significance to prevent birth defects. At present, monitoring methods, diagnosis technologies and analysis methods of birth defects are increasingly perfect in our country, but there are still some deficiencies such as missed diagnosis, misdiagnosis and missing reports, which lead to biased prediction probability of birth defects. Besides, birth defects cohort data belong to class imbalance resulting in decreased predictive performance of the risk prediction model created. This project focuses on risk prediction model of birth defect. Using probability calibration technology and machine learning algorithms with perfect performance, an improved prediction model will be built to predict the risk probability of birth defect by combining random forests and support vector machine model based on Platt scaling probability calibration and the traditional logistic model. Probability calibration and ensemble learning technique are verified based on the numerical simulation and the UCI machine learning public data sets. Then the improved model created will be used for birth cohort data of Shanxi Province to screen risk groups of birth defects effectively and to alert or control the risk found. This project may provide basis for birth defects intervention strategy development and is of great significance for preventing birth defects and improving population quality.
英文关键词: Birth defect;Probability calibration;Ensemble learning;Risk prediction