项目名称: 基于序列谱进化信息的蛋白质远程同源性检测方法研究
项目编号: No.61300112
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 刘滨
作者单位: 哈尔滨工业大学
项目金额: 23万元
中文摘要: 蛋白质远程同源性检测是研究蛋白质结构和功能的有效手段之一。由于远程同源蛋白质序列相似性较低,目前的计算方法不能准确检测蛋白质的远程同源性。序列谱包含了多序列比对中的进化信息,提取和利用序列谱中的进化信息是提高预测精度的关键。本项目以提取和利用序列谱进化信息为切入点,通过结合生物学、数学、自然语言处理技术和机器学习算法来探索新的计算方法。研究内容包括:1)通过提取序列谱中的进化信息,生成基于序列谱的蛋白质表示形式;2)采用自然语言处理技术、序列谱比对算法和多核学习方法检测蛋白质远程同源性。寻找与自然语言中的词等价的蛋白质组成成份和蛋白质序列的语法规则;3)结合生物学背景知识挖掘蛋白质家族的特征;4)应用本项目提出的远程同源性检测方法,解决蛋白质折叠识别和蛋白质相互作用位点预测问题。本项目的研究在理论上可以推动蛋白质序列、结构、功能之间的映射关系的研究,在应用上可以促进医药学和农业的发展。
中文关键词: 序列谱进化信息;蛋白质远程同源性检测;自然语言处理技术;蛋白质家族特征发现;蛋白质结构和功能预测
英文摘要: Protein remote homology detection is one of the key techniques for the study on protein structure and function. Since the remote homologous proteins share low sequence similarity, the current available computational methods cannot accurately detect the protein remote homology. Profiles contain evolutionary information extracted from the multiple sequence alignments. Therefore, it is crucial to extract and adopt the evolutionary information from profiles for accurate protein remote homology detection. This project will explore new computational methods for protein remote homology detection by using the evolutionary information extracted from profiles and the techniques and knowledge from different disciplines, such as biology, mathematics, natural language processing and machine learning. Our tasks can be summarized as follows: 1) Generation of a novel profile-based protein sequence representation by extracting the evolutionary information from profiles; 2) Application of natural language processing techniques, profile-based alignment algorithm and multi-kernel learning to protein remote homology detection. Exploring the building blocks of proteins, similar to the words in the human language, as well as the grammar rules of protein sequences; 3) Exploration into the features of protein families on a biological ba
英文关键词: evolutionary information;protein remote homology detection;nature language processing;identification on features of protein family;prediction on protein function and sturcture