项目名称: 基于农作物高通量表达谱数据的特征选择与分子网络构建的评估算法
项目编号: No.61272207
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 梁艳春
作者单位: 吉林大学
项目金额: 80万元
中文摘要: 本项目在搜集已有农作物高通量表达谱数据的基础上,通过对高通量测序数据和微阵列数据的分析,针对农作物高通量数据小样本的特点,开发融合GO, Pathway和QTL等多种信息的特征选择算法。通过使用提出的多阶段多Agent的集成学习方法融合各生物过程和阶段的信息,从海量高维异种数据中选取与特定农艺性状最相关的基因、RNA和蛋白质集合。进一步研究融合特征选择与网络信息的混合数据耦合建模方法,构建农作物基因和农艺性状相关的复杂分子网络,通过对已知的QTL数据的引入和建模,对分子网络构建结果进行多种手段的评估验证。本项目将提出和开发一套基于农作物高通量表达谱数据的流程和平台,研究人员可以进行特定农艺性状机理分析、构建相关的分子网络,并对机理网络构建结果进行多数据源的评估。通过机理与数据的结合与分析,揭示与相应农艺性状相关的分子遗传和代谢机理,为分子育种设计和复杂基因组复杂性状的分子遗传改良提供范例。
中文关键词: 基因表达谱;特征选择;分子网络;;
英文摘要: On the basis of collecting known agricultural high throughout data and analysis for the high throughout sequencing data and microarray data, the proposed project aims to develop novel feature selection algorithms incorporating Gene Ontology (GO), Pathway and Quantitative Trait Locus (QTL) under the guidance of known mechanism information. As the main challenge of system biology is data incorporating, and the small samples and high features problem of high throughout data make classical technologies inefficiency, even the extremely small samples of agricultural high throughout data make it difficult in studying this feature selection problem. Merging biological information of different stages and aspects of biological process, we propose a novel ensemble algorithm with multiple stages and multiple Agents in incorporating multiple algorithms, and then use appropriate scoring methods to choose the consensus results of these problems. The proposed project will first present a novel algorithm in the framework of Support Vector Machine - Recursive Feature Elimination (SVM-RFE) under the framework of the proposed ensemble algorithm framework. This algorithm not only considers the kernel width from training results by using SVM, but also intends to add weights on specific features when they are in the same pathway. The
英文关键词: gene expression;feature selection;molecular network;;