项目名称: 当机器智能遇到人类计算─基于众包的分类数据挖掘技术研究
项目编号: No.71301071
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 管理科学
项目作者: 许开全
作者单位: 南京大学
项目金额: 23万元
中文摘要: 由于很多大数据是未经分类和标注的原始数据,其蕴含的丰富商业价值很难被利用。而阻碍分类挖掘技术对大数据进行分类挖掘的最大障碍是:极度缺少标注训练样本。而新的人类计算方式─众包,能进行低成本、高效率的数据标注。本课题主要探讨有效利用众包标注的数据进行分类挖掘,所要解决的理论与技术挑战,以便对大数据进行低成本、高时效的分类挖掘。本课题将研究集成众包的分类模型,来利用众包标注的不准确和冗余的训练样本,获得较好的分类性能;同时还研究集成众包的主动学习方法,来高效选择样本、标注者、标注策略,从而更有效的利用众包实现更好的分类挖掘。本研究期望丰富数据挖掘的分类模型理论和主动学习理论,并有望开拓主动学习的新方向。
中文关键词: 数据挖掘;商务智能;分类技术;众包;
英文摘要: Since most of big data is raw data, the commercial value in it cannot be utilized. The biggest obstacle to use classification technique in mining big data is: the lack of labeling data as training samples. The new human computation, crowdsourcing, can label data with very low cost and high-efficiency. This research project will explore the related theory and technical challenges when using crowdsourcing labeling data in classification, in order to mine big data with low-cost and high-efficiency. This project will study the classification model of integrating crowdsourcing, to utilize the inaccurate and redundant training samples from crowdsourcing, to achieve good performance; also the active learning method of integrating crowdsourcing will be studied, to efficiently select samples, labelers and labeling strategies, to achieve good performance. This study will enrich the theories of classification model and the active learning, and is expected to open up a new direction of active learning.
英文关键词: data mining;business intelligence;classification;crowdsourcing;