项目名称: 与RNA、DNA及蛋白质绑定的固有无序片段的分析及预测
项目编号: No.11501407
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 彭珍玲
作者单位: 天津大学
项目金额: 18万元
中文摘要: 固有无序蛋白质在自然状态下缺乏稳定的空间结构,却仍在细胞中执行生物学功能。它们在物种中普遍存在,并在细胞中参与信号传导及调控等重要功能。研究发现它们的这些功能常通过其中的固有无序片段与RNA、DNA及蛋白质的互作用来实现。然而现代实验技术难以探测固有无序片段及其功能。另一方面,目前关于蛋白质与RNA、DNA及其他蛋白质的结合位点的预测工作仅关注了具有稳定结构的蛋白质,却忽视了固有无序蛋白质。鉴于上述事实,本项目致力于解决与RNA、DNA及蛋白质绑定的固有无序片段的预测问题。首先,通过系统分析这三类固有无序片段的序列信息,探究其不同于一般氨基酸片段的序列特征;并根据获取这些特征的速度,将其分为简单和复杂两类。其次,利用这两类特征,分别开发快速的及精确的基于机器学习的计算方法,用于这三类固有无序片段的预测。最后,在蛋白质组上应用这些计算方法,并建立针对固有无序片段的这三种功能的数据库。
中文关键词: 固有无序蛋白质;固有无序片段;机器学习;功能预测;特征提取
英文摘要: Intrinsically disordered proteins lack stable 3D structure, but still perform biological functions in vivo. They are very common in nature and play a variety of functions including cell signaling and regulation. Previous studies suggested that they participate in these functions via the interaction between their intrinsically disordered segments and other molecules, including RNA, DNA, and proteins. However, it is very difficult to detect intrinsically disordered segments, let alone their functions, by experiments. On the other hand, lots of efforts were put into the prediction of binding sites between proteins and RNA, DNA and other proteins. But these studies are limited to the proteins with stable 3D structure, instead of the intrinsically disordered proteins. Therefore, we are motivated to find a way to detect the potential disordered segments that bind to RNA, DNA and proteins. Specifically, we systematically analyze the disordered segments with the three binding events, and extract the sequence features which can distinguish them from other segments. Based on the speed to obtain these features, we divide them into the easy one and the complicated one. Using these two types of features, we develop the machine learning-based computational methods, which focus on the prediction speed (i.e., fast prediction) and prediction quality (i.e., accurate prediction), respectively. These methods provide a vital and highthroughput way to predict the intrinsically disordered segments that interact with RNA, DNA and proteins. Finally, we apply these methods at the proteomic level to build a database for these three binding events mediated by intrinsic disorder.
英文关键词: Intrinsically disordered protein;Intrinsically disordered segment;Machine learning;Function prediction;Feature extraction