项目名称: 面向异分布数据的主动学习方法
项目编号: No.61502117
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 吴伟宁
作者单位: 哈尔滨工程大学
项目金额: 21万元
中文摘要: 如何利用大量未标注数据来提高分类模型的泛化性能是当前机器学习、模式识别研究中备受关注的问题之一。主动学习有效利用了未标注数据的潜在信息,降低了构造训练集所需精确标注代价,成为解决该问题的主流方法之一。但是,传统主动学习在选取样本和添加标注过程中往往存在一些理想化假设,限制了主动学习效果。本项目针对异分布数据具有动态分布、大规模和噪声标注的特点,放宽传统主动学习较严格的假设,旨在开展面向异分布数据的主动学习研究。本项目研究针对动态异分布数据的主动采样策略,克服样本同分布假设的局限;同时,计算基于局部敏感哈希索引的样本不确定度,提升样本选择效率;针对噪声标注信息,主动估计所选样本的正确标注,进一步降低噪声标注的影响。最后,通过在视觉对象-类别检索系统上的应用,验证了面向异分布数据的主动学习方法在图像检索任务上的有效性,证明了该方法的性能优势。
中文关键词: 机器学习;主动学习;采样策略;标注估计
英文摘要: In the current research of machine learning and pattern recognition, it has attracted extensive attentions that how to utilize a large number of unlabeled data to enhance the generalized ability of the classifier. Active learning has become one of the main methods in solving the problem, because the potential information contained in the unlabeled data is fully utilized, and then annotation costs of constructing training sets are also reduced. But the application of active learning algorithms is restricted as a result of some idealized assumptions in example selection and annotation querying. In this project, considering the characteristics of dynamically distribution, large-scale data and noisy annotations, we plan to develop active learning algorithms by relaxing these strict assumptions in existing works of active learning. We plan to discuss an active sampling strategy in conditions of dynamically non-identical distributed data in order to overcome the limitation of identical distribution assumption. At the same time, we calculate the uncertainty of local-sensitive hashing data in order to increase the sampling efficiency. Then, in conditions of noisy annotations, we estimate the precise annotations of selected examples in order to further reduce the impact of noise. At last, we apply the above-mentioned active learning algorithms on a real task of object-to-category retrieval, and then validate the effectiveness of active learning algorithms.
英文关键词: machine learning;active learning;sampling strategy;annotation estimation