分类体系不确定的机器学习研究

项目名称： 分类体系不确定的机器学习研究

项目编号： No.61473274

项目类型： 面上项目

立项/批准年度： 2015

项目学科： 其他

项目作者： 罗平

作者单位： 中国科学院计算技术研究所

项目金额： 80万元

中文摘要： 传统的机器学习研究通常假定：数据的分类体系是明确恒定的，即数据分类标准恒定，且训练样本与测试样本的类别相同。随着机器学习技术逐渐走向实用，不可避免地在越来越多的任务中面临数据分类体系的不确定问题；在这样的问题中，数据的决策属性（分类类别）可随参数变化，数据的条件属性（观测特征）可随参数变化，且样本类别可能增加。本项目从决策属性函数化、条件属性函数化和类别增加三个方面考虑分类体系不确定下的机器学习问题，拟从理论上分析上述因素对可学习性的影响，提出能同时耦合学习分类体系和分类模型的机器学习方法；从数据量大并不断积聚的需求出发，提出这些问题的在线学习方法；并力求在实际问题（例如：量化股票交易）中得到验证应用。基于上述研究工作，本项目将在国内外一流期刊（TKDE、TKDD、ML等）和顶级会议（KDD、ICML、IJCAI、ECML、CIKM、ICDM等）上发表论文10-15篇。

中文关键词： 机器学习；不确定分类体系；在线学习；泛化；学习理论

英文摘要： Conventional machine learning research usually assumes that the data taxonomy is clear and stable. However, with the widely use of machine learning techniques in various real-world applications we encounter more and more tasks where the data taxonomy cannot be determined in advance. In these tasks, the class labels on the instances may change with different settings of parameters on data taxonomy, the features on the instances may also change with the feature parameters, and new class labels may appear in the testing data. Therefore, in this project we carefully consider the Indeterminate Classification Taxonomy in terms of parameterized class labels, parameterized features, and augmentable class labels, and aim to theoretically analyze how these factors influence the learnability. Specifically, we will propose the method which can collaboratively learn the classification taxonomy and classification model simultaneously. Additionally, considering that big data accumulate continuously we will also extend these problems into online learning paradigm. Finally, all the proposed methods will be applied to real-world applications (e.g. quantitative trading) for practical evaluation. We hope that this project will output 10-15 high-quality papers published in prestigious journals (e.g. TKDE, TKDD, ML etc.) and top conferences (KDD, ICML, IJCAI, NIPS, ECML, CIKM, ICDM etc.).

英文关键词： Machine Learning;Indeterminate Classification Taxonomy;Online Learning;Generalization;Learning Theory

成为VIP会员查看完整内容