项目名称: 蛋白质相互作用预测的集成学习方法研究
项目编号: No.61300128
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 曹智
作者单位: 湖南大学
项目金额: 25万元
中文摘要: 蛋白质是生物功能的直接执行者,而蛋白质相互作用是细胞进行一切代谢活动的基础,因此,蛋白质相互作用成为了当前蛋白质组研究的一个热点。这个问题的解决能够为蛋白质功能的分析、生命发育的探索、有效药物的开发等提供基础。本项目将以蛋白质序列特征为基础,通过构建蛋白质相互作用标准数据集,预测并评估蛋白质相互作用,最终将构建高质量的蛋白质相互作用网络,用于识别关键基因、药靶及发现复合物和功能模块等应用:根据氨基酸的理化性质,基于模糊理论对20种氨基酸分类,在分类的基础上结合距离频率、L-Z 复杂度、字符频率特征和字符位置特征来获取蛋白质序列信息的特征值,有效提取蛋白质序列信息;根据多个蛋白质相互作用数据库筛选正样本,设计负样本筛选算法,进而构建蛋白质相互作用标准数据集;基于随机子空间和特征映射的集成学习框架,保证预测模型的准确性、泛化能力;基于流形学习的蛋白质相互作用可信度评估,过滤预测结果中的噪声
中文关键词: 蛋白质相互作用;蛋白质序列特征表示;集成学习;流形学习;
英文摘要: Protein is the direct performer of biological function, and protein interaction is the foundation of all cell metabolic actives, so protein interaction becomes a hot topic in current proteome studies. The solution of this problem can lay the foundation for the analysis of the protein function, the exploration of life development and the development of drugs. This project will be based on the characteristics of protein sequence, by building a standard protein-protein interaction data set, to predict and evaluate protein-protein interactions and finally construct a high quality protein interaction network which can be used to discover essensial genes、drug targets、complexes and functional modules: According to the physicochemical properties of amino acids, based on the classification of 20 kinds amino acids, combined with distance frequency, LZ complexity, characteristics of the character frequency characteristics and the character position to obtain the eigenvalue of protein sequence information, and then effectively extract the protein sequence information; chooseing the positive samples based on multiple protein interaction datasets,and choosing the negative samples based on our proposed method,then constructing a standard protein-protein interaction dataset; to ensure the accuarcy and extensive ability,the meta
英文关键词: protein interaction;protein sequence representation;ensemble learning;manifold learning;