项目名称: 基于生物医学文献和领域本体的蛋白质复合物预测方法研究
项目编号: No.61300088
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 张益嘉
作者单位: 大连理工大学
项目金额: 23万元
中文摘要: 预测蛋白质网络中的蛋白质复合物是探索各种生命活动机理的重要基础,对于人们深入了解生命系统意义重大。当前公开的蛋白质网络数据仅能表示蛋白质间的拓扑结构信息,这使得复合物预测研究中无法利用复合物重要的功能特性。本项目针对这一核心问题,利用自然语言处理方法,抽取生物医学文献中蕴含的蛋白质相互作用类别信息,整合基因本体资源,构建蛋白质生物属性网络;基于属性图聚类理论,建立生物属性网络的距离模型,融合生物属性网络中的网络拓扑和生物属性两种异构信息;并结合Core-Attachment结构理论,建立高效的蛋白质复合物预测模型。本项目从挖掘并整合生物医学文献和基因本体领域知识入手,不仅为复合物预测研究提供了重要的生物属性信息,而且提出了一种整合多元领域知识进行复合物预测的理论框架,使蛋白质复合物预测研究能将复合物的结构特征和功能特性有机地结合,为建立高效的复合物预测方法提供了新的思路和理论依据。
中文关键词: 自然语言处理;关系抽取;文本挖掘;蛋白质复合物识别;生物信息学
英文摘要: Protein complex prediction in protein networks is an important foundation for exploring various life activities and significant in enhancing understanding about living system. Current protein networks only contain topology information among proteins, which makes protein complex prediction cannot exploit the functional features of protein complex. To solve the core problem, this project extracts categorization information of protein-protein interaction in biomedical literature based on natural language processing methods and integrates the gene ontology resource. Based on the biological information, we construct protein biological attributed networks. Furthermore, we propose distance model of protein biological attributed networks based on attributed graph clustering theory to combine the two heterogeneous information of network topology information and biological attribute information. Ultimately, we build efficient model for protein complex prediction in protein biological attributed networks based on core-attachment theory. This project starts by mining biological literatures and integrating gene ontology resource, which provides vital biological attribute information for protein complex prediction and a theoretical framework for integrating diverse domain knowledge to predict protein complex. This project can
英文关键词: Natural language processing;Relation extraction;Text mining;Protein complex identification;Bioinformatics