项目名称: 蛋白质超二级结构特征表达及分类算法研究
项目编号: No.61309013
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 邹东升
作者单位: 重庆大学
项目金额: 23万元
中文摘要: 蛋白质超二级结构预测研究对于阐明蛋白质空间折叠机制和功能机理有着重要的科学意义。研究有效的蛋白质超二级结构序列模式的特征表达方法,设计输入空间维数低、分类准确率高的学习算法是解决目前超二级结构预测问题的瓶颈与难点。本课题针对现有特征提取方法没有考虑序列的顺序和耦合信息、残基长程相互作用和序列残基统计分布特性以及分类算法输入空间维数高、准确率低、计算速度慢的缺陷,利用多特征融合的思想,构建基于氨基酸组成成份、多肽组成成份和氨基酸组成分布的超二级结构序列特征表达方法;采用离散增量结合双联支持向量机的学习算法,降低输入空间维数,减少计算开销,提高预测准确率;进一步提出基于粒子群优化算法的参数优化方法,通过参数调整克服样本不平衡问题。本课题研究可解决目前超二级结构预测研究中序列模式特征表达及分类算法上存在的主要缺陷,能够为我国生物制药设计、农业生物科技等领域的可持续性研究提供相关科学依据。
中文关键词: 超二级结构;特征表达;离散量;二次判别分析;双联支持向量机
英文摘要: Study on prediction of protein supersecondary structures has important scientific significance for understanding protein three-dimension folding and function mechanism.There are two bottlenecks for supersecondary structure prediction at present. One is representing the feature information for sequential patterns of protein structural motifs completely. The other is to explore classification algorithms with low-dimension input vector and high accuracy. The available research have limitations on both sides. On the one hand, the methods for feature representation could not take into accout the evolutionary information, such as order and coupling information, segmental distribution and long distance effects of amino acids. On the other hand, high dimension of input vectors, low accuracy and slow calculation are the main inadequate of the present classification algorithms. To address this challenge, a novel method is proposed in this work by combining amino acid basic composition, polypeptide composition and amino acid composition distribution. This study also attempts to reduce the dimension of input vector and to improve the prediction accuracy by combining diversity increment measure and twin support vector machine as classification algorithms. Furthermore, a parameter optimization method based on particle swarm
英文关键词: supersecondary structure;feature representation;diversity measure;quadratic discriminant analysis;twin support vector machine