项目名称: DNA\RNA相互结合的蛋白质预测与统计分析
项目编号: No.61305072
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 马昕
作者单位: 南京审计学院
项目金额: 23万元
中文摘要: DNA\RNA-结合蛋白在生物体细胞活动中起到至关重要的作用。国际上利用机器学习算法对DNA\RNA结合蛋白预测的研究,都是单纯的提取蛋白质序列特征或结构特征来识别结合蛋白,而判定DNA\RNA结合蛋白的最有效证据-DNA\RNA-结合残基的存在却无人考虑。本项目拟在前期DNA\RNA-结合残基预测模型完善构建的基础上,利用预测出的结合残基的信息判定某一特定蛋白质是否为DNA\RNA结合蛋白。研究内容包括:(1)分别对DNA-结合蛋白与非结合蛋白、RNA-结合蛋白与非结合蛋白中预测出的结合残基进行多方面的统计分析(2)利用分析结果对结合蛋白和非结合的蛋白构建出具有显著差异的特征,并通过机器学习方法利用序列信息获得DNA-结合蛋白预测模型和RNA-结合蛋白预测模型。(3)分别构建DNA-结合蛋白预测平台和RNA-结合蛋白预测平台,为蛋白质功能和药物设计提供技术和数据支持。
中文关键词: DNA-结合蛋白;RNA-结合蛋白;特征筛选;机器学习;
英文摘要: DNA\RNA-binding proteins plays critical roles in cellular functions. On the international level, the general methods of DNA\RNA-binding protein prediction using machine learning algorithms usually extracted protein sequence features or structural features to identify the binding protein. However the existence of DNA\RNA-binding residues is almost no consideration. Base on the well-built of DNA-binding residues prediction model DNABR and RNA-binding residues prediction model PRBR, we will present the method to identify whether a query protein is DNA-binding protein(RNA-binding protein) or not by using the information of DNA-binding residues(RNA-binding residues) in the sequence. Research contents are as follows: (1) statistical analysis of binding residues in DNA-binding proteins(RNA-binding proteins) and non-binding proteins. (2)using the statistical results to built significant differences between binding proteins and non-binding proteins. And the prediction models of DNA-binding proteins and RNA-binding proteins will be built from amino acid sequence by using machine learning methods. (3) Two web-server systems which constructed based on DNA-binding protein model and RNA-binding protein model are used to facilitate researchers for efficiently predicting DNA-binding proteins and RNA-binding proteins repectively
英文关键词: DNA-binding proteins;RNA-binding proteins;feature selection;machine learning;