项目名称: 面向大数据的机器学习理论与方法
项目编号: No.61332007
项目类型: 重点项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 朱小燕
作者单位: 清华大学
项目金额: 300万元
中文摘要: 互联网发展带来的海量数据,引领了科技与经济发展新的趋势,提出了新的挑战。大数据具有的噪声大、结构复杂多样、变化快等特点,是传统的观察-假设-检验的科学方法,以及现有基于概率统计的数据驱动理论与方法都难以应对的。亟需发展一套面向数据密集的新计算理论与方法,才能对大数据进行有效的处理,并从中及时发现有用的信息。为探讨这样的理论与方法,本项目申请设置了以下3个研究内容:1)多粒度隐层表示的学习理论与方法,以充分挖掘大数据背后隐含的本质规律与特性;2)大数据环境下自适应学习方法与学习策略,以应对大数据快速多样的变化;3)建立大规模图像内容分析与理解应用平台,以验证基础理论和方法的有效性。本课题通过理论和应用的结合,旨在发展与创立应对大数据复杂环境的机器学习理论与方法,预期在大数据处理多粒度隐层表示特征学习、自适应学习等方面取得关键技术的突破,实现海量图像和视频智能处理、信息服务原型系统平台。
中文关键词: 机器学习;粒度;深度学习;自适应学习;抽象知识
英文摘要: With the fast growth of Internet, Big Data is becoming a new trend in the frontier of the technology and economic development and has raised new challenges for scientists. Because of its high noise, great structure variety, and fast evolving velocity, Big Data is hard to be handled using either the traditional methods that follow the 'observation-hypothesis-testing'-type paradigm of scientific research or the existing data-driven methods with a base on the solid probability and statistic theories. Therefore, in order to effectively analyze Big Data and timely discover the underlying useful information, it is imperative to develop a new set of machine learning theories and methods that can meet the requirements of data intensive analysis tasks. To systematically investigate and build such a set of theories and methods, this project proposes to carry out the following three themes of work: 1) develop new theories and computational methods for learning hierarchical latent representations to reveal the essential properties and patterns underlying Big Data; 2) develop machine learning algorithms and strategies that automatically adapt according to the fast and diverse changes of Big Data; and 3) apply the new learning theories and algorithms to image content analysis and understanding and develop a prototype platform to demonstrate and prove their effectiveness. With a systematical investigation and combination between theories and practical applications, this project aims to develop a new set of machine learning theories and algorithms that can deal with Big Data effectively. With the accomplishments of this project, we expect to make breakthrough contributions on some key technologies of hierarchical latent representation learning and adaptive learning in the Big Data environment, and develop a prototype platform for intelligent image and video content analysis and to provide information service.
英文关键词: machine learning;granularity;deep learning;adaptive learning;abstract knowledge