项目名称: 面向异构数据库的查询语言设计及其基础理论研究
项目编号: No.61502336
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 张小旺
作者单位: 天津大学
项目金额: 20万元
中文摘要: 随着大数据时代到来,数据的多样性使得与异构数据库相关的关键技术研究变得越来越重要。本项目设计一种异构数据库查询语言R-SPARQL并对其语言的本原性、表达性、复杂性等基础理论性问题进行研究。主要研究内容包括:1)提议了三种类型异构数据库:无型性异构数据库(每个子数据库的类型和属性非确定的)、有型性异构数据库(每个子数据库的类型和属性是确定的)和同构数据库(又称传统数据库,每个子数据库的类型和属性是单一的)(为了比较R-SPARQL语言在异构数据库与传统数据库查询能力)。2)讨论R-SPARQL语言的两类查询:SELECT查询(返回一个映射的集合)与布尔查询(返回“真”或“假”)。3)研究R-SPARQL语言在三类异构数据库中两类查询(六情形)的本原性、表达性与复杂性等基础理论性问题。最后,实现基于R-SPARQL语言的异构数据库查询系统。
中文关键词: 异构数据;数据库管理;数据集成;半结构化数据;模式查询
英文摘要: With the coming of Big Data Age, it becomes more and more important to study heterogeneous databases due to the variety of data, as one basic feature of Big Data. This proposal designs a language R-SPARQL for querying over heterogeneous databases and discusses its primitivity, expressivity, and complexity. The main content of this proposal includes the following three aspects: 1) define three types of heterogeneous databases, namely, untyped heterogeneous databases (whose schema is undefined),typed heterogeneous databases (whose schema is defined), and homogeneous databases (whose schema is defined and a singleton)(aim to compare the capabilities of queries to heterogeneous data with queries to conventional data). 2) discuss two kinds of queries, namely, SELECT query (which returns a set of mapping (called solutions)) and boolean query (which returns true or false). 3) Study the primitivity, expressivity, and complexity of six cases with three types of heterogeneous databases and two kinds of queries above. Finally, this proposal implements an R-SPARQL-based querying system for heterogeneous databases.
英文关键词: Heterogeneous data;Database management;Data Integration;Semi-structured data;Schema querying