项目名称: 多标记医学诊断数据建模方法的研究
项目编号: No.61273305
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 李国正
作者单位: 中国中医科学院
项目金额: 82万元
中文摘要: 解决科学领域大规模数据分析的挑战性任务将带动机器学习的发展。多证候中医诊断病例是典型的多标记数据。已有多标记建模方法欠缺考虑中医诊断数据的特点:特征由望闻问切四种来源的症状组成、各标记在病例中出现频次严重不均衡、丰富的医学理论未在建模中有效利用。本项目计划从多证侯中医诊断数据建模的典型应用出发,研究新型的多标记数据建模方法:一是基于集成学习的望闻问切四诊症状融合建模方法;二是嵌入特定基分类器的标记不均衡克服建模方法;三是提炼中医诊断理论为规则和约束的先验知识利用建模方法。新方法将在高血压和失眠等多证候中医诊断数据和其它科学领域的公开数据上进行验证,旨在提高特定医学领域任务的建模效果,也为其它科学领域的数据分析提供工具和参考。
中文关键词: 大数据;机器学习;多标记学习;中医;生物信息学
英文摘要: To solve the challenging task of massive scientific data processing will promote the development of machine learning techniques. Medical records with multi-syndrome in traditional Chinese medicine (TCM) are multi-label data. Existing multi-label learning methods do not consider the characteristics of the TCM diagnosis data: there are four kinds of symptoms like watching, listening, inquiring and pulse taking; there exists imbalance among the labels, there are fruitful theories for diagnosis which are not utilized in modeling. This project plan to develop novel multi-label learning techniques from the typical applications of multi-syndrome medical diagnosis data modeling: the first is to develop multi-label information fusion methods for four different symptom collection; the second is to invent imbalance multi-label learning methods embedded specific base learner; the third is to study multi-label learning methods intergrating prior knowledge from medical diagnosis theory. Novel algorithms will be applied to hypertension and insonoia data sets and other public scientific data sets. This study aims to improve modeling accuray and provide tools and reference for other scientific data analysis.
英文关键词: Big Data;Machine Learning;Multi-label Learning;Traditional Chinese Medicine;Bioinformatics