项目名称: 面向过时信息自动发现的Web时态一致性研究
项目编号: No.61272109
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 李石君
作者单位: 武汉大学
项目金额: 80万元
中文摘要: Web包含过时信息是一个普遍现象,严重影响了Web信息质量。目前解决这一问题主要采用人工排查,还没有形成系统的理论,迫切需要自动发现Web过时信息的理论和方法。从这一现象可提炼出科学问题"保持Web时态一致性",其挑战是时态信息的语义理解与抽取,以及时态信息的复杂约束关系。本项目研究在Web内容要素中加入时态要素的Web时态对象模型,对站点、栏目、子栏目与页面的内容和时态采用层次树统一建模;研究利用时态特征词对各时态要素的自动提取及评估方法;研究Web时态对象模型中栏目、子栏目与页面为保持时态一致性须遵循的复杂约束关系,以及由其中已知时态信息推断未知时态信息的推理机制和代数系统。从而建立Web时态一致性理论体系,提出自动发现Web过时信息的方法和工具,将在网站过时网页自动发现与排序、同类网站质量排序、时间感知的搜索排序等方面有着重要的应用前景,能极大地节约人力,提高Web信息质量。
中文关键词: 时态Web;时态一致性;Web数据质量;过时信息发现;不一致检测
英文摘要: The prevalence that the web contains outdated information is one of the main reasons for the crappy web information quality. The solution for this problem is artificial both domestic and overseas for the present, while the systematic theory hasn't been formed. Therefore it is in desperate need for the theory and methodology to automatically discover the outdated information. The scientific issue of this phenomenon is how to keep the web temporal consistency, which challenges to semantic comprehension and extraction of the temporal information as well as complex constraint relation in it. As a result, this project will focus on these items: the web temporal object model which adds the temporal factor into the web content element; a unified tree model to build based on the content and temporal of the website, column, sub column and the pages; the automatic extraction and assessment approach of the temporal factors using the temporal features; the complex constraint relation observed for temporal consistency of the column, sub column and the pages in the temporal object model; as well as the reasoning mechanism and algebraic system for given information to unknown information. We will build a web temporal consistency theory system to present the methodology and implement to discover the outdated web information aut
英文关键词: Temporal Web;Temporal Consistency;Web data quality;Outdated information;Inconsistent detection