项目名称: 蛋白质结构类预测中的特征信息提取与分类算法研究
项目编号: No.11426056
项目类型: 专项基金项目
立项/批准年度: 2015
项目学科: 数理科学和化学
项目作者: 丁淑妍
作者单位: 大连民族学院
项目金额: 3万元
中文摘要: 蛋白质结构类预测在蛋白质二级结构预测、蛋白质的空间结构和功能预测等领域扮演着重要的角色。本项目针对蛋白质结构类预测中的特征信息提取和融合等问题展开,主要研究如何全面获取有效的特征信息和设计可融合多源特征信息的分类策略。主要内容包括:通过马尔科夫链模型、字统计模型和信息熵结合定义子序列重叠度,对子序列分类,以此为基础研究不同子序列的结构差异;通过多元统计方法研究如何提取特异性打分矩阵中涉及到的氨基酸残基之间、不同突变情况之间隐含的内在关联信息,并最终确定合理的氨基酸最大间隔范围;将模糊神经网络技术引入到蛋白质结构类预测分类策略中,有效融合多源信息,提高蛋白质结构类预测精确度。本项目研究基于现有的测试数据展开,同时构建充足、稳定的独立数据加以验证。本项目的研究成果将有助于蛋白质空间结构和功能的研究,还能够为蛋白质结构类信息分析和应用算法设计提供新的思路。
中文关键词: 机器学习;信息提取;信息融合;;蛋白质结构类
英文摘要: Knowledge of structural class information of a given protein plays an important role in the prediction of secondary structure, tertiary structure and function analysis from the amino acid sequence. This project aims at problems related to protein information extraction and fusion, and focuses on how to extract structural features from protein sequence and design the multi-source information fusion classification strategies. The main contents include: studying how to extract features from the specific scoring matrix to reflect the information of different amino acids and different columns based on multivariate statistical methods; with the utilization of Markov model, word statistical model and information entropy to define subsequence overlapping degree, then studying the structure differences of different subsequences; integrating the multi-source information based on fuzzy neural network classification strategy to improve the result of protein structural classes prediction. The research is based on the public datasets, and we will construct abundantly independent datasets to test our method. Results of this project will not only contribute to the study of protein spatial structure and function, but also to provide new ideas for the analysis of protein structural classes information and application algorithm de
英文关键词: Machine learning;Information extraction;Information fusion;Protein structural classes;