项目名称: 面向用户的数据质量管理方法研究
项目编号: No.61472263
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 周晓方
作者单位: 苏州大学
项目金额: 83万元
中文摘要: 在大数据时代,高质量的数据已经成为个政府、企业、研究机构和社会的重要资源与财富。但是随着数据规模的持续高速增长,人们在获取更加丰富、多样的数据同时,也必须面对数据质量管理方面的一系列全新挑战,因此亟需研究能够适应海量、动态、多源、异构数据并贯穿整个数据生命周期的数据质量管理,特别是以用户需求为中心的差异化数据质量保证机制。本项目拟研究一种面向用户自定义需求的通用化数据质量管理机制,重点研究具有通用性和可伸缩性的数据数质量管理机制与方法,使其能够柔性适配不同领域、不同要求的数据质量标准,支持用户对个性化数据质量要求的灵活、非过程化描述;对大规模动态复杂数据,拟采用数据挖掘技术从底层数据出发生成各种与数据质量相关的数据特征表述和度量,并从高层数据质量定义对数据进行约束检验和数据清洗,最终提供诸如关系数据库中的约束条件保证的数据质量保证。
中文关键词: 数据质量管理;数据库;数据质量评估;数据溯源;大数据
英文摘要: High quality data has become valuable resources and assets in the big data age, for government, research organization and society. With the proliferation of large scale data in every walk of life, people can access and use diversed data service nowadays, but on the other hand, the issue of data quality is now exposed at a much wider and critical level. It is thus important to investigate the data quality management for massive dynamic heterogenous data in the whole data cycle, particularly adapt to the specific data quality requirements from users. This project aims to investigate the generalized data management mechanism in big data age, and the goal is to find data management solutions with superb capability in terms of generalization and scalability. In this way, different data quality standerds and requirements from different application domain can be adapted in an automatic and smooth way. Particularly, we will design a declarative data quality specification language to support flexible and non-procedural quality description. For the scalable complex data, data mining techniques are further used to summarize the quality related key features and measurements of data. From the database level, we conduct data quality evaluation and enhancement based on the specification language, and finally achieve data quality ensurement like integrity constraint support function in relational database system.
英文关键词: Data Quality Management;Database;Data Quality Evaluation;Data Provanance;Big Data