项目名称: 海量深网数据源入口的自动发现与集成研究
项目编号: No.61472296
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 计算机科学学科
项目作者: 李雁妮
作者单位: 西安电子科技大学
项目金额: 81万元
中文摘要: 随着Web在线数据库(Web Database, WDB)的激增,Web正在迅速地深化,其绝大部分高质量的海量信息隐藏在WDB对外提供的唯一入口--查询接口后而无法由传统的搜索引擎索引到,因此,研究在Web信息搜索和Web大数据集成领域的两个亟待解决的基本关键难题:海量WDBs入口的自动发现与集成具有重要意义。本项目针对已有研究缺乏对问题进行抽象建模,采用启发式的单机串行低效算法,没有给出问题可行的完整性解决方案等缺陷,采用抽象与形式化描述与求解问题的方法,创新研究上述两个领域关键问题的有效建模方法、高效分布式并行算法,以期突破这两个领域关键难题给出问题可行的完整性解决方案。在此基础上,通过概括总结,揭示出一般复杂/大数据问题分析处理时所蕴含的一些基础理论和方法,为该类问题的有效求解起到一定的推动和借鉴作用。
中文关键词: 深网;数据源入口;发现;集成
英文摘要: The Web has been rapidly deepened by the tremendous Web databases (WDBs) online with the potentially unlimited high-quality information hidden behind each WDB only entry, searchable form/query interface. Since the Deep Web(most of the contents from WDBs) is an important yet largely-unexplored frontier, great attentions are being paid in the fields of Web information search and virtual Web Big Data etc. However, there remain two basic challenges in them, the Web-scale automatic discovery and integration for WDBs'query interfaces due to the non-structured query interfaces with the 4V properties of Big Data: Volume, Variety, Velocity and Value over the Web. To address the two challenging problems and overcome limitations with respect to their non-modeling,inefficiently heuristic serial algorithms, and infeasible or incomplete sulosions for the problems,we will deeply research on methods of optimal modeling and efficient distributed parallel algorithms based on cloud computating for the challenging problems with abstract and formal description and solving problems. More important, on this basis, some instructively basic theories and methods for the complex problems/Big Data analysis and processing are expected to by summarizing.
英文关键词: Deep Web;Database Entries;Discovery;Integration