基于机器学习的蛋白质翻译后修饰位点预测的研究

项目名称： 基于机器学习的蛋白质翻译后修饰位点预测的研究

项目编号： No.11301024

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 数理科学和化学

项目作者： 徐岩

作者单位： 北京科技大学

项目金额： 22万元

中文摘要： 蛋白质翻译后修饰是一种调节蛋白质功能的重要机制，使蛋白质的功能更为完善, 调节更为精细。最近研究发现蛋白质翻译后修饰与癌症、衰老、心脏病、老年痴呆等许多疾病密切相关，因此准确地识别蛋白质翻译后修饰位点不仅对于更深入地理解各种疾病发生的分子机制，而且对药物设计有重要的帮助。目前，通过实验鉴定这些修饰位点既耗费人力、物力又低产量,甚至对一些蛋白质修饰还很难测定。因此迫切需要开发计算方法来预测蛋白质翻译后修饰位点。本项目主要是研究整合蛋白质各种数据资源信息的新特征表示，进而建立更符合实际问题的部分有监督预测模型，开发在线预测网页及离线大规模数据预测软件包。研究主要采用最优化和机器学习的方法，提出基于支持向量机和条件随机场的新预测模型，并探索其中的模型选择等理论问题。本项目的研究不仅可以为生物学家提供有效的计算模型和实用软件，同时有利于药物研发领域的深入研究，而且还可以丰富最优化领域的研究内容。

中文关键词： 机器学习；蛋白质修饰；预测；PU学习问题；

英文摘要： Protein post-translational modifications(PTMs) play a very important role in living organisms which make the structure of proteins more complex, the function more perfect, the regulation more specific. Increasing evidences have indicated that abnormal PTMs exist in various major tumour and cancers.Therefore, accurate identifying the PTMs sites in proteins is very important to both cell mechanism and drug development. Experimental identification of PTMs sites with a site-directed mutagenesis strategy is laborious and low-throughput due to the labile nature and the low-abundance of PTMs. In the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational method for timely and reliably identifying the PTMs sites in proteins. In this project we would like to propose the new positive and unlabeled predictive model based on the new feature construct by incorporating the various protein data sources. We will develop the online webserver and offline large-scale data predictive software packages based Linux and Java. We mainly apply the optimization and machine learning approaches to construct the new predict model based on support vector machine and conditional random filed. Furthermore, we will study optimizational theory problems such as model selection. In conclusi

英文关键词： machine learning；Posttranslational modification；Prediction；PU learning problem；

成为VIP会员查看完整内容