项目名称: 基于生物信息学和自然语言处理的水稻抗病基因挖掘
项目编号: No.61202305
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 夏静波
作者单位: 华中农业大学
项目金额: 22万元
中文摘要: 伴随着水稻基因测序的完成和基因注释时代的来临,水稻抗病基因/蛋白质积累了大量的生物信息学数据和生物文献数据,而水稻抗病功能基因的发掘当前仍显滞后。课题组在功能基因挖掘和生物信息学算法的已有研究基础上,将基于序列、结构和表达差异分析的生物信息学方法和基于语义学的生物自然语言处理方法相结合,对水稻白叶枯病、抗稻瘟病等抗病基因进行发掘。首先利用基因表达差异的微阵列方法和病原物基因共显的语义学方法构建初筛候选抗病基因数据集,再提取序列结构信息、基因本体论信息、文献语义词条信息和生物事件提取信息,通过支撑向量机构建四个分类预测器,最后利用神经网络建立多分类器的综合评判系统,经过系统自检和完善得到可信的水稻抗病基因发掘系统。在传统基于序列结构的从头预测方法基础上,本课题有效结合基于文本挖掘的自然语言处理方法,能缩短水稻抗病基因的筛选过程,其数据和结果的可信度将较传统生物信息学方法有所提高。
中文关键词: 生物自然语言处理;水稻;知识发现;;
英文摘要: Along with the completion of rice genome sequencing and beginning of gene annotation era, the research on rice resistance gene/ protein has accumulated a large number of bioinformatics data and biological literature data. Unfortunately, the current exploration of the functional genes of rice disease resistance still lags behind. Based on the research basis of our previous work in functional gene discovery and bioinformatics algorithm study, and by using both bioinformatics methods based on sequence, structure, gene expression analysis and natural language process methods based on semantics, the gene discovery system for Xanthomonas oryzae pv.oryzae resistant gene and Magnaporthe grisea resistant gene in rice is constructed. First, we use the microarray method and text mining method to build a filtered candidate gene dataset, then we use sequence structure information, gene ontology information, documentation semantics of entry information and biomedical event extraction information and build through the support vector machine four categories of predictors, and finally we use artificial neural network to establish a comprehensive evaluation of the multiple classifiers system and construct credible rice disease resistance gene discovery system through cross-validation. Methods used in this research combine text m
英文关键词: Biomedical natural language process;Rice;Knowledge discovery;;