蛋白质三级结构的新型相似性度量方法和高维自适应聚类分析算法研究

项目名称： 蛋白质三级结构的新型相似性度量方法和高维自适应聚类分析算法研究

项目编号： No.61272213

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 路永钢

作者单位： 兰州大学

项目金额： 80万元

中文摘要： 蛋白质三级结构的相似性度量和聚类分析在蛋白质的结构预测和功能预测中都起着非常重要的作用。目前常用的相似性度量方法都是基于残基间距离的。虽然这种度量方法有利于发现全局的相似性，但它将蛋白质视为刚体，不利于体现蛋白质的柔性和发现蛋白质的局部结构相似性。而识别局部结构相似性，对于预测蛋白质的功能和发掘蛋白质进化过程中留下的大量信息都非常有帮助。因此，研究新的既能够体现全局结构相似性，也能体现局部结构相似性的更合理的相似性度量方法就非常关键。另外，在目前最好的几个蛋白质三级结构从头预测方法中，聚类分析算法都被用来做最后的预测结果筛选。这一步是这些结构预测方法的关键点之一。但是，这一步被采用的聚类算法却都受限于一些由经验得出的参数。本工作将首先研究新的更适合蛋白质结构的相似性度量；并将结合我们最近在基于势能的自适应聚类分析算法方面的研究成果，来设计更加有效的适合蛋白质结构预测的新型聚类分析算法。

中文关键词： 聚类分析；蛋白质结构比对；蛋白质三级结构预测；蛋白质三级结构聚类；高维空间

英文摘要： Measuring structure similarities and performing cluster analysis play important roles in protein tertiary structure prediction and protein function prediction. Currently, the most widely used protein structure similarity measures are all based on the distances between residues. Although these measures are suitable for identifying the global structure similarity, they cannot be used to identify the local structure similarity, and they treat proteins as rigid bodies, which ignores the flexible nature of the protein structures. However, the local structure similarity is a key to the protein function prediction, and is very helpful for acquiring the useful information in the evolution. So, it is very important to design a new protein structure similarity measure which can identify both global and local structure similarities. Currently, the most effective ab initio protein tertiary structure prediction methods all use cluster analysis for selecting the final prediction results from large sets of decoys. This step is one of the bottlenecks of these methods. However, the clustering algorithms used here are all undermined by a few parameters whose values need to be determined empirically. We will first concentrate on the designing of a more proper similarity measure for protein structures. And based on this work and ou

英文关键词： Clustering；Protein structure comparison；Protein tertiary structure prediction；Protein tertiary structure clustering；High dimensional space

成为VIP会员查看完整内容