项目名称: 图上的快速分类、聚类算法研究
项目编号: No.61203296
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 张燕明
作者单位: 中国科学院自动化研究所
项目金额: 24万元
中文摘要: 过去十五年中,机器学习和模式识别领域对基于图的学习算法给予了极大的关注,产生了大量有代表性的工作,如:非线性降维、谱聚类、基于图的半监督学习等。然而,典型的对向量数据构图方法的时间复杂性为O(n^2)(n为样本数量),典型的基于图的机器学习算法的时间复杂性为O(n^3),因此无法在大规模实际问题中使用。本研究以改进现有构图算法和基于图的学习算法的复杂性为目的,旨在设计实现具有线性O(n)或近似线性O(n*log^k(n))(k为常数)时间复杂性的图算法,从而满足千万以上规模数据的应用的需要。实现上,我们将以近似算法和随机算法为主要工具、以对图结构进行稀疏化近似为核心方法来设计快速的构图算法以及图上的聚类、分类算法。此外,我们还将详细分析算法的时间和空间复杂性以及近似算法产生的误差。最后,我们将探索这一研究在网页分类和图像分割等问题中的应用,从而推动基于图的机器学习方法的实用化。
中文关键词: 基于图的机器学习方法;半监督学习;大规模机器学习方法;;
英文摘要: In the past decade, researchers in the machine learning community have paid great attentions to graph-based learning methods. Many successful graph-based methods have been proposed for tasks such as semi-supervised learning, clustering, dimension reduction etc. However, typical graph construction methods have a running-time complexity of O(n^2) where n is the number of data points, and graph-based learning methods have a time complexity of O(n^3), thus, cannot be directly used for large-scale applications. In this research, we aim to address this computational challenge by designing graph-based methods with linear or near-linear complexity. Based on methods of graph sparsification, we approximate general graphs by a much simpler one, then design fast clustering and classification algorithms by tools of random algorithms and approximation algorithms. Using tools of spectral graph theory, we will analyze the error bound of the approximation and time complexity of the proposed algorithm. Finally, we will apply our methods to real world applications such as image segmentation and webpage classification.
英文关键词: graph-based machine learning;semi-supervised learning;large-scale algorithm;;