项目名称: 支持跨模型多源数据的复制检测关键技术研究
项目编号: No.61272178
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 王斌
作者单位: 东北大学
项目金额: 81万元
中文摘要: 网络技术在使得数据共享便捷的同时,也使得数据复制更加容易,由此引发的数据复制检测问题变得越来越不容忽视。数据复制检测对于软件设计、防止盗版、改善信息检索质量等方面具有至关重要的作用。现有的数据复制检测技术主要是面向同一类型的数据,忽视了多源数据的异构性、复制关系的复杂性、复制方向的不确定性、及复制数据的大规模性等特点。这些特点使复制检测技术变得更加复杂,更具有挑战性。本项目申请旨在揭示多源环境中异构数据之间复制的内在联系,为实现更加适合实际应用的跨模型多源数据复制检测技术提供重要的理论依据和检测方法。主要研究内容包括:支持跨模型的数据映射策略、支持跨模型复制的相似性度量方法、多维度复制方向判定、支持大规模数据的检测优化算法等。设计、实现和评价相关的算法,争取在相关理论和技术上取得一定的突破, 为今后的实际应用推广奠定坚实的基础。
中文关键词: 复制检测;数据模型;异构数据;多数据源;相似性匹配
英文摘要: Nowadays, network techniques bring a wide spectrum of maximizing sharing and coping data. Data copy detection problems become more and more important. Data copy detection plays important roles on not only piracy prevention but also improving quality of information retrieval from the Web. The existing copy detection techniques, however, only focus on data sources using the same data model and ignore the important features of multiple data sources, such as heterogeneous data model, complex copy relationships, uncertain copy directions, and large scale data. These features make the problem more complex and challenge. The proposal aims to discover the internal affects of copying diction among multiple heterogeneous data models, and providing important theory evidences and detection approaches for real applications. The primary research contents include key techniques on copying detection, which are data matching strategies across heterogeneous data models, similarity functions for copying from heterogeneous data sources, multiple dimensional copy direction determination approaches, and detection optimization algorithms for large scale data. The project will design, implement, and evaluate the proposed algorithms. We try to have a breakthrough to get better database theory and techniques, and develop a base for re
英文关键词: copy detection;data model;heterogeneous data;multi-source data;approximate matching