项目名称: 非同义单核苷酸变异影响蛋白质功能的预测方法研究
项目编号: No.31471243
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 生物科学
项目作者: 叶志强
作者单位: 北京大学
项目金额: 70万元
中文摘要: 新一代测序技术的空前发展,使得基因组变异数据迅速积累,因此从中鉴定出影响功能的变异成为一项迫切的需求。研究计算方法预测基因组变异特别是非同义单核苷酸变异(nsSNV)对蛋白质功能的影响是解决该需求的必经途径。经过十多年的发展,该方向在预测准确率上似乎已进入平台期,在挖掘新颖预测属性方面也缺乏大进展。本项目拟从该类方法的若干步骤进行优化和创新,力争突破当前困境。具体包括:在第一步就重视对训练数据集进行比较和优选,进而改进自动构建多序列比对的流程以提高序列属性的质量;基于自行预测的蛋白质结构探索新颖空间属性,以扩展该类属性所能适用的范围;摸索划分训练数据的最佳方式,用以形成若干差异性较小的子集并分别选择属性训练预测模型;结合其他工具的预测分值进一步构建复合预测模型。最后将形成独立预测工具,以供研究者从海量数据中挖掘出导致功能改变的nsSNV,进而协助解读可能的疾病机制或药物差异反应的机理。
中文关键词: 生物信息算法;单核苷酸变异;序列比对;机器学习;蛋白质结构预测
英文摘要: With the unprecedented development of next-generation sequencing technology, the genomic variation data accumulated rapidly, thus it is of an urgent demand to identify those impacting proteins' function. To meet this demand, it is a nessary approach to develop computational methods to predict the functional impact of genomic variants, especially the non-synonymous single nucleotide variants (nsSNV) on proteins. After developing more than 10 years, this direction seems to be on a plateau concerning the prediction accuracy, and lacks large progress on mining novel prediction attributes. In this proposal, we plan to optimise and to innovate at several steps for improving this kind of methods. In detail, we will first emphasize comparing and selecting the available training datasets, and will improve the automatic pipeline for multiple sequence alignment in order to optimise the quality of sequence-related attributes. Second, we will explore novel spatial attributes based on predicted protein structures in order to expand the scope of this kind of attributes. Third, we will search optimal dataset partition to obtain several subsets with lower heterogeneity, and select attributes and train prediction models on these subsets seperately. Fourth, we will construct meta-models combining scores from other tools and our own. We will finally build up a standalone prediction tool, so that researchers can use it to identify nsSNVs with funtional impacts from the data ocean, and further help intepreting possible disease etiology or mechanism of differential drug effects.
英文关键词: Bioinformatics Algorithm;Single Nucleotide Variant;Sequence Alignment;Machine Learning;Protein Structure Prediction