项目名称: 蛋白质结构模体识别及结构预测算法研究
项目编号: No.61272318
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 卜东波
作者单位: 中国科学院计算技术研究所
项目金额: 80万元
中文摘要: 依据蛋白质序列推断空间结构的规范法,理论上能够利用模板的结构信息,但现有算法产生的序列-结构联配质量有待提高,对远同源蛋白折叠类型的识别率仅约2/3。 本课题沿着"在序列-结构-进化联合空间下考察序列-结构之间的关联关系"这一思路,使用网络流技术识别远同源蛋白质的保守结构框架;利用H-form来刻画维持蛋白质结构框架稳定性的关键结构模体;将链式条件随机场扩展到树型条件随机场,以实现考虑关键结构模体信息的序列-结构联配;使用线性规划技术优化能量函数,以降低能量表面的Golf-hole现象,扩大天然构像所在吸引盆的面积,从而提高搜索到天然构像的可能性。作为最终成果,将提供高性能的结构预测软件包及开放的预测服务网站。 蛋白质结构是一大类共性问题的典型代表:线性序列在局部作用与全局作用的综合影响下呈现出复杂结构。本课题成果将启发我们解决信息检索等领域的类似问题,推动并丰富信息科学。
中文关键词: 蛋白质结构预测;残基相互作用预测;残基包埋预测;马尔科夫随机场;线性规划
英文摘要: The threading approaches aim to predict protein 3D structure from it primary sequence. With Structure information of templates available, threading approaches were treated as the most effective and accurate strategy for protein structure prediction. However, the existing approaches suffer from the low quality of sequence-structure alignment, and the fold recognition ratio is only 2/3 for remote homology protein. The project follows the strategy to put sequence-structure relationship under the sequence-structure-evolutionary joint space. Specifically, we employ network-flow technique to identify the most conserved structure framework shared by a set of remote-homogues, and define H-form to capture the conserved structure motifs. We further extend chain CRFs (conditional random field) to treeCRFs to take into consideration the long-distance contacts. To reduce the potential Golf-hole phenomena in energy landscape, linear programming technique is used to maximize the attracting basin where native conformation lies in. This way, the probability to reach the native structure is significantly improved. Preliminary experimental results suggest the effectiveness of the strategies in the study. We will also implement all algorithms into a practical software, and provide prediction servie through internet for the communi
英文关键词: Protein structure prediction;residue-residue contacts;solvent accessibility;Markov random field;linear programming