项目名称: 海量众包数据管理的关键技术
项目编号: No.61472141
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 王晓玲
作者单位: 华东师范大学
项目金额: 80万元
中文摘要: 众包(例如亚马逊的AMT)作为一种新兴的商业模式,通过在线社区的形式,寻求新数据或新观点。众包数据是对DBMS 数据的补充,如何将封闭世界(DBMS 建立的前提)与开发世界(众包数据的来源)结合起来,通过众包数据扩大DBMS中数据的广度和深度,是当前的研究热点。然而,众包应用所产生的大数据,具有不完整性、主观性、噪音干扰等特点,加剧了数据管理的复杂性和难度。本课题面向从实际应用中所萃取的关于众包数据管理的基础研究问题。我们拟从分析DB-hard问题(需要理解和主观分析的数据)入手,以关系-众包数据引擎为核心,探索海量众包数据管理中的理论与关键技术。研究内容包括:众包任务的建模与组织、众包数据的查询与分析、自适应的个性化搜索、在线检测等技术,为新型的众包应用提供数据管理的理论基础和技术。并开发实现面向MOOC应用的众包数据管理平台的原型系统,探索具有自主知识产权的工具栈,支撑现实应用。
中文关键词: 海量数据管理;众包数据;查询处理;个性化检索;数据分析
英文摘要: CrowdSourcing is a new business model, and it is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. Amazon AMT platform is the examples of crowdsourcing applications. Crowd data is a typical big data, which is more subjective and noisy. So it is very difficult to conduct data management and process. However, crowd data is very helpful and useful for DBMS, how to combine the DBMS's closed-world and Crowd's open-world is the key in recent study. This project is based on the analysis of DB-hard problem, including missing data and subjective analysis. The goal is to explore relational-crowd data engine according to the data quality and data analysis. The topics include data model, query and analysis, personalized search, online detection, optimization for crowd data and task schedule strategy. Our goal is to provide new solutions and techniques for DB-hard problem by taking advantage of crowd data. A prototype for MOOC application will be implemented to verify our methods and support real applications.
英文关键词: Massive Data Management;Crowd Data;Query Processing;Personalized Search;Data Analysis