项目名称: 面向不平衡分类任务的主动学习方法研究
项目编号: No.61305058
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 于化龙
作者单位: 江苏科技大学
项目金额: 23万元
中文摘要: 主动学习是机器学习及数据挖掘领域研究的重要方向之一,该技术通过主动选择学习样例的方式,可降低学习算法的样本复杂度,从而减少手工标注的代价。然而,当将传统的主动学习算法应用于不平衡分类任务时,其学习过程可能会受到无标记样本不平衡分布的影响,使算法难以获得令人满意的学习效果。本项目分别根据数据池和数据流等两类不平衡分类任务各自的特点,从"查询样本"的选择﹑学习过程的"平衡控制"及学习停止条件的判定等三个影响主动学习性能的关键步骤入手,研究可缓解不平衡样本分布影响的有效策略,进而提出适用于不平衡分类任务的主动学习算法。此外,还将根据多类不平衡分类任务自身的结构特点,扩展已有的研究成果,提出具有针对性的面向多类不平衡分类任务的主动学习算法。项目的研究成果有望在金融欺诈检测﹑网络入侵检测﹑垃圾邮件过滤、文本分类﹑视频监控及生物信息学等多个领域得到实际应用,因此具有较重要的理论与应用价值。
中文关键词: 类别不平衡学习;主动学习;不确定性度量;决策输出补偿;集成学习
英文摘要: Active learning is one of major research fields in machine learning and data mining. It can reduce the sample complex by actively selecting the samples to learn,further reduce label costs by human.However, traditional active learning algorithms often fail to produce excellent enough classification performance for skewed classification tasks, due to its learning process will be destroyed by imbalanced unlabeled sample distribution. This project will first analyze the features of pool-based and stream-based imbalanced classification tasks, respectively. Then the project will research the strategies to alleviate the effect of class imbalance from three aspects, which correspond to three key procedures in active learning: query sample selection, balance control and stopping decision. Based on the work above, an effective active learning algorithm, which is specifically designed for imbalanced classification tasks with unbabeled samples, can be proposed. Furthermore, this project will also investigate the structure features of multiclass imbalanced classification tasks and present effective active learning algorithms. The research findings can be widely applied in many real fields, including financial fraud detection, network intrusion detection, spam filtering, video monitoring, Bioinformatics etc., thus this resear
英文关键词: class imbalance learning;active learning;uncertainty measurement;decision output compensation;ensemble learning