项目名称: “新类别发现”学习及其应用
项目编号: No.61473087
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 刘胥影
作者单位: 东南大学
项目金额: 84万元
中文摘要: 新类别发现是本项目提出的一类新的机器学习问题,目标是利用已知类别的标记数据在未标记数据中发现新类别,并使所有类别上的分类能力最优。该问题来源于真实的微生物新物种发现(宏基因组数据封装)问题:对一个群落所有微生物的基因进行物种分类,其中绝大部分微生物尚未发现。项目围绕新类别发现及其应用从六个方面深入研究:(1)提出一种利用已知类数据发现新类,并使分类能力最优的算法(2)提出一种结合数据结构信息和监督信息的算法(3)提出一种利用类别相关性处理大量类别的快速高效的算法(4)提出一种有效发现新小类的算法(5)提出一种利用生物分类树学习的封装问题算法,并应用于973项目中的真实问题(6)提出一种基于多任务的算法,并应用于973项目中动态研究群落结构。本项目可望发表国际期刊会议国内一级学报高质量论文8-10篇,申请国家发明专利2项,软件著作权1项。
中文关键词: 机器学习;新类别;大量类别;类别不平衡;多任务学习
英文摘要: The project proposes New Classes Discovery, a new kind of machine learning problem. The learning target is to discover new classes from the unlabeled data using the labeled data from known classes, and to optimize the classification ability regarding all classes. It stems from a real-world problem, discovery of unknown microbe species (binning problem of metageonomics), which requires classifying the mixed gene sequences of all species from a microflora, but most of the species are undiscovered yet. The project studies the problem of New Classes Discovery and its applications from 6 aspects: (1) proposing an algorithm which uses labeled data to discover new classes from the unlabeled data, and optimizes the classification ability; (2) proposing an algorithm combining data structure information and supervised information; (3) proposing an algorithm by exploiting the class correlations to deal with many classes efficiently and effectively; (4) proposing an algorithm which can discover new minority classes effectively, and an algorithm which can discover new minority classes effectively from many classes; (5) proposing a taxonomy-based algorithm for binning problem, and applying it to the real-world problems in an ongoing 973 project; (6) proposing a multi-task-based algorithm for dynamic analysis of microflora in the 973 project. It is expected to publish 8-10 high quality papers on international journals and conference, and national top level journals, applying for 2 patents and 1 software copyright, and training several graduate students.
英文关键词: machine learning;new class;many classes;class imbalance;multi-task