项目名称: 蛋白质亚线粒体定位及其特征信息和预测算法的挖掘
项目编号: No.61461038
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 无线电电子学、电信技术
项目作者: 樊国梁
作者单位: 内蒙古大学
项目金额: 36万元
中文摘要: 随着蛋白质组数据与日俱增,针对具体亚细胞器的蛋白质定位及其功能研究更为迫切。在构建高质量的包括多定位的蛋白质亚线粒体定位数据子库的基础上,深入分析不同亚线粒体定位蛋白质的序列信息及生物功能。提取不同亚线粒体定位蛋白质的氨基酸序列信息、进化信息、特征模体信息和与功能域相关的简单超二级结构等信息,并首次引入新的特征信息参数:化学位移(chemical shift)特征参数和蛋白质黏性(protein stickiness)参数。利用特征融合理论将这些特征参数合理组合,采用离散增量、推广的位置关联的矩阵和协变判别式等预测方法结合其他机器学习方法的基础上,提出更加合理的蛋白质亚线粒体定位理论预测算法。不断改进预测算法和信息参数的选取方法,提高预测能力和推广能力,可以更好地理解蛋白质亚线粒体定位及其功能,为未知的蛋白质亚线粒体定位提供帮助,同时也为其他亚细胞定位在特征参数提取和算法上提供一定的指导。
中文关键词: 定位预测;蛋白质序列分析;多组学数据融合;生物信息处理;多组学数据挖掘
英文摘要: The study on protein localization of detailed subcellular organs and its functions has become more urgent, with the growth of proteomic data. Based on the establishment of a sub-dataset of high-quality location data including multiplex protein submitochondrial localization, we will conduct an in-depth analysis of the sequence information and biological function of protein localized by different submitochondria. Extracting information on amino acid sequence, evolution information, characteristic motif, and simple super-secondary structure related to the functional domain, and two novel feature information parameters, namely, chemical shift and protein stickiness. These feature parameters would be rationally combined based on feature fusion theory, and the more rational prediction algorithm will be raised for protein submitochondria localization based on other machine learning methods, using an increment of diversity, extended position posibility matrix (PPM), and covariant discriminate. Continuous improvement of the prediction algorithm and selection of information parameters and enhancement of predictive and generalization ability can gain a better understanding of the correlation between protein submitochondria locations and functions, and help the unknown protein submitochondria localization. Furthermore, we will provide guidance on selection of feature parameters and algorithms for other subcellular localizations.
英文关键词: localization prediction;protein sequence analysis;Multi-omics data fusion;bioinformatic process;Multi-omics data mining