大规模机器学习问题的结构优化方法研究

项目名称： 大规模机器学习问题的结构优化方法研究

项目编号： No.61273296

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 陶卿

作者单位： 中国人民解放军陆军军官学院

项目金额： 83万元

中文摘要： 机器学习正面临着数据规模日益增长的严峻挑战，如何处理大规模甚至超大规模数据问题是当前统计学习亟需解决的关键性科学问题。大规模机器学习问题的训练样本集合往往具有冗余和稀疏的特点，机器学习优化问题的正则化项和损失函数也蕴含着特殊的结构含义，直接使用整个目标函数梯度的批处理黑箱方法不仅难以处理大规模问题，也无法满足机器学习对结构的要求。目前，依靠机器学习自身特点驱动而迅速发展起来的坐标优化、在线和随机优化算法成为解决大规模问题的有效手段。本项目主要研究充分利用训练数据结构和有效保证机器学习问题结构的大规模优化算法，特别是正则化损失函数优化问题的坐标优化、在线和随机优化算法，其中包括发展基于新优化原理的在线与随机算法、提出保证损失函数结构的在线及坐标优化算法和得到求解正则化非光滑损失的坐标优化算法等等

中文关键词： 机器学习；正则化损失函数问题；结构优化；随机优化；坐标下降

英文摘要： Machine learning is facing the great challenge arising from the endlessly increasing scale of data. How to cope with the large-scale even huge-scale data is a key problem in emerging area of statistical learning. Usually, there exist redundancy and sparsity in the training set of a large-scale problem, and there are structural implications in the regularizer and loss function of a learning problem. If we straightforward employ the gradient-type and black-box methods in batch settings, not only the large-scale problems can not be solved but also the structural information implied by the machine learning can not be exploited. Recently, the state-of-the-art scalable methods such as coordinate descent, online and stochastic algorithms, which is driven by the characteristics of machine learning, have become the dominant paradigm for large-scale problems. This project is devoted to the scalable optimization algorithms that can not only sufficiently exploit the structure of training sets but also effectively keep the structure of learning problems. In particular, we study the coordinate descent, online and stochastic algorithms for the minimization of regularized loss problems. The main content of this project includes the online and stochastic algorithms based on new optimization principles, the coordinate descent a

英文关键词： Machine Learning；Regularized Loss Problems；Structural Optimization；Stochastic Optimization；Coordinate Descent

成为VIP会员查看完整内容