项目名称: 面向复杂数据的稀疏流形学习方法研究
项目编号: No.61272333
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 雷迎科
作者单位: 中国人民解放军电子工程学院
项目金额: 79万元
中文摘要: 本项目主要针对大规模、高维数、非线性、噪声污染等复杂性数据,系统研究基于稀疏表示的流形学习方法,及其在蛋白质相互作用数据中的应用。首先,设计一种基于广义相关的非负稀疏表示近邻图构建模型,有效解决基于K-近邻或∈-球近邻准则构图的流形学习方法对于数据噪声敏感和邻域尺度参数选择难的问题。在此基础上,提出一种基于最小相关熵的数据流形本征维数估计方法。接着,设计一种基于局部与全局联合保持嵌入的稀疏流形学习方法,并针对大规模复杂数据,构建基于最小子集覆盖和谱回归混合策略的高效求解方案。最后,提出一种鲁棒的基于大规模稀疏流形嵌入的蛋白质相互作用数据去噪方法,为检测大规模蛋白质相互作用网络中的假阳性与假阴性噪声问题提供一条新的解决途径。本项目的开展既能促进机器学习的基础理论研究,又能推动其在生物信息学领域的应用。
中文关键词: 流形学习;稀疏表示;近邻图构建;蛋白质相互作用;
英文摘要: This project systematically studies sparse representation based manifold learning methods for analyzing complex data including large-scale, high-dimensional, nonlinear, and noisy data as well as their applications to the protein-protein interaction data. Firstly, an efficient non-negative sparse neighborhood graph model based on generalized correlation is devised to overcome the drawbacks of KNN or ∈-ball graph based classical manifold learning methods, such as difficulties in tuning graph neighborhood size and sensitivity to noise. According to this graph, a minimum-relative-entropy based intrinsic dimensionality estimation method for data manifold is developed. Secondly, we propose a novel sparse manifold learning method which seeks to find a low-dimensional manifold embedded in the ambient space by preserving locally and globally geometric structure, meanwhile adopting minimum set cover and spectral regression techniques to make the proposed method suitable for large-scale complex data. Deriving from the proposed method, we further develop a robust large-scale sparse manifold embedding method to assess the reliability of protein-protein interactions and predict new ones which can be considered as one novel promising solution for detecting both false positive and false negative interactions in protein inter
英文关键词: manifold learning;sparse representation;neighborhood graph construction;protein-protein interactions;