项目名称: 不确定性Web数据质量在线评估技术研究
项目编号: No.61003040
项目类型: 青年科学基金项目
立项/批准年度: 2011
项目学科: 轻工业、手工业
项目作者: 韩京宇
作者单位: 南京邮电大学
项目金额: 7万元
中文摘要: Web数据质量评估是Web数据管理的源头。为了对Web数据质量实现在线评估,提出对不确定性Web数据质量建模,从时间和空间范围内训练模型和提取信息,在线获取质量图谱。即对Web数据质量演化用随机过程模型刻画,根据训练的模型在线评估,从时间范围获取质量评估值;通过从不同网站的海量数据中融合和提取事实的完美表达,构建质量知识库,在线评估时将Web数据和事实的完美表达比对,从空间范围内获得质量评估值。本研究解决飞速膨胀的Web数据质量混乱的问题,是从根本上消除Web上"数据丰富、信息贫乏"的入手点,结合了数据库技术、数据挖掘、信息检索和机器学习的最新研究成果,具有较高的学术起点。其成果可以直接应用到信息检索、Web数据集成和电子政务等领域,产生直接效益。
中文关键词: Web;数据质量;不确定性;质量图谱;在线评估
英文摘要: Web data quality assessment is the first step to effectively manage web data. To realize online web data quality assessment , we propose that based on uncertain web data quality modeling, metadata are collected and models are trained in terms of time and space dimensions, thus online accomplishing web quality profile. Specifically, by modeling the evolution of web data quality we can online assess data quality in terms of data history. Based on quality knowledge base, which is constructed by colleting all the related data and synthesizing the perfect description of entity, we can online assess data quality by comparing data with its perfect description. This research aims to find good solutions to effectively and efficiently identify low quality web data which is emerging at a rapid race. Our research borrows ideas from many fields such as database, data mining, information retrieval and machine learning. The outcome can be applied in many areas such as inforamtion retrieval, data integration, E commerce, etc. and have a direct impact on economics and society.
英文关键词: Web;data quality;uncertainty;quality profile;online assessment