基于扩展模糊积分的生物信息数据挖掘研究

项目名称： 基于扩展模糊积分的生物信息数据挖掘研究

项目编号： No.61202295

项目类型： 青年科学基金项目

立项/批准年度： 2013

项目学科： 计算机科学学科

项目作者： 王金凤

作者单位： 华南农业大学

项目金额： 24万元

中文摘要： 人类DNA序列中基因的变异可能导致疾病，如何查找影响疾病的重要基因位以及根据基因的变异诊断病例是亟待解决的问题。许多机器学习方法已用作生物信息数据挖掘的有效工具,但传统方法多是假设特征相互独立，难以解决现实存在的基因位之间交互问题。而模糊积分是基于模糊测度的一种非线性融合函数，可以有效描述特征交互程度。因此模糊积分被用于DNA数据挖掘具有较强优势。本项目破除传统模糊测度单调性的限制，基于符号型模糊测度提出两种扩展模糊积分-多重模糊积分和多项式模糊积分，并将其应用到乙肝病人DNA数据挖掘中。拟采用遗传算法和L1-Norm相结合的方法确定模糊测度值，判断相应的基因及基因组合对诊断结果的影响程度，发现DNA序列的重要基因位，根据基因以及基因组合的变异来诊断个案病例，预结果表明诊断正确率比传统方法大有提升。此研究对模糊积分的理论和应用研究有着积极的推动作用，同时为生物信息领域研究提供新的技术支持。

中文关键词： 模糊积分；生物信息；扩展研究；HBV诊断预测；

英文摘要： The variation of genes in human DNA series leads to diseases. How to find the important genes and diagnose the case according to the variation of genes is the urgent focus problem. Many machine learning methods have been used as the effective data mining tools for bioinformatics data. But the traditional methods mostly assumed that the features are independent, which cannot resolve the realistic interaction problem among genes. Fuzzy integral is a kind of nonlinear fusion function based on fuzzy measures. It can describe the degree of interaction among features very well. So, fuzzy integral has strong superiority in DNA data mining. This project discard the monotonicity of traditional fuzzy measure and proposed two kind of generalized fuzzy integrals- - -multiple fuzzy integral and polynomial fuzzy integral which are applied to HBV data mining. We intend to use Genetic Algorithm combined with L1-Norm to determine the values of fuzzy measure and justify the affection degree of corresponding genes or gene combinations for diagnosis to find the important gene markers. The patient case can be diagnosed according to the variation of genes or gene combinations. The results showed that the accuracy of diagnosis has been improved greatly compared to traditional methods. This research will not only promote actively the d

英文关键词： Fuzzy Integral；Bioinfomatics；Extended Research；HBV Diagnosis and Prediction；

成为VIP会员查看完整内容