项目名称: 基于关键词的大规模链接数据搜索技术研究
项目编号: No.61502095
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 计算机科学学科
项目作者: 李慧颖
作者单位: 东南大学
项目金额: 20万元
中文摘要: 链接开放数据项目已经汇集了超过50 billions的RDF三元组,主题覆盖出版物、地理、多媒体、生命科学等众多领域。如何帮助用户获取感兴趣的数据和信息是当前语义Web研究领域最关心的问题之一。相比SPARQL查询必须掌握查询语言语法和待查询数据模式,普通用户更适合关键词查询方式。现有语义Web搜索引擎往往仅提供RDF文档或实体的搜索,不支持更复杂的查询需求(如查询多个实体及实体间的关系)。本课题研究基于关键词的大规模链接数据搜索问题:研究多粒度链接数据摘要模型和索引方法;研究关键词查询理解方法;研究高效地将关键词查询转换为结构化查询(用查询图表示)的方法;研究查询图相关性评价问题。最终帮助用户在大规模、异构、互链数据中跨数据源地进行高效和有效的关键词搜索。
中文关键词: 语义网;链接数据;语义搜索
英文摘要: Linking Open Data Project has collected more than 50 billions RDF triples, which covers a wide range of different topical domains such as publications, geographic, media, life sciences. How to retrieve information from such a large scale linked data is an important problem in the Sematic Web research field. Usually, user prefers to keyword query rather than SPARQL query, because it is difficult for the user to master the query language syntax and the RDF data schema. While existing Semantic Web search engines can only provide RDF document or entity searching instead of complex information query (such as association query). Our research focuses on the problem of keyword query over large scale linked data. We research the multi-granularity summary model and the index approach for linked data, the query understanding approach, the efficient keyword query approach that can convert keyword query to formal query (representing by schema graph), the approach to rank the schema graphs. The research will help users to make efficient and effective keyword query across the large scale, heterogeneous, linked data.
英文关键词: Semantic Web;Linked Data;Semantic Search