项目名称: 大规模非结构化数据的智能存储管理与检索关键技术
项目编号: No.61070054
项目类型: 面上项目
立项/批准年度: 2011
项目学科: 金属学与金属工艺
项目作者: 张孝
作者单位: 中国人民大学
项目金额: 10万元
中文摘要: Web环境下网页、多媒体、电子文档等非结构化数据已经达到PB级并蕴含大量信息和巨大价值,比如视频监控数据还可以用来跟踪特定的对象,实施行为分析、模式挖掘等实时商务智能。同时大数据量和无结构使得存储管理和检索的难度日渐加大,迫切需要高效、有效的技术从长远角度来研究和解决相关的关键问题。本项目采用数据库方法来研究大规模非结构化数据的智能存储管理和检索的关键技术,建立了一个基于自由表技术的统一管理平台myBUD。myBUD具有高度可扩展性,能够支持数据规模的不断扩展和新型数据。通过深入研究与原型实现,我们提出的自由表方法能够对系统内的非结构化、半结构和结构化数据进行基于内容的统一自适应存储管理;提供针对特定查询清洗不确定数据的数据抽取能力;基于扩展簇特征树CFTree*索引的智能检索与知识挖掘等。通过课题研究,我们认为非结构数据管理仍然是目前数据管理中的一个热点领域。如何实现结构化数据和非结构化数据的一体化管理和使用则产生一系列需要未来深入研究的问题,包括模型完善、存储扩展能力、适应新的计算环境的架构/方法、大数据分析技术等等。
中文关键词: 非结构化数据管理;自由表;自适应存储;myBUD;智能检索
英文摘要: The volume of unstructured data, such as web pages, multimedia, electronic documents, keeps growing to peta-byte scale in the context of web. Meanwhile, there exists large amount of information and business value in those unstructured data. Surveillance video data can be applied to, for instance, track specific objects and then enforce the behavior analysis and pattern mining to enable the live business intelligence as the result. The very large data size and structurelessness make it more difficult to store and retrieve those unstructured data as well. In this project, we employed the database approach to research the key techniques on intelligent storage management and retrieval of large-scale unstructured data by implementing a universal platform, myBUD, based on Free-Table. myBUD, i.e.my Bank of Unstructured Data, is highly extensible to support various types of existing unstructured data or to-emerge data in the future. Furthermore, Free-Table enables the content-based universally adaptive storage management, cleaning uncertain data for data extrction, CFTree*-based intelligent search and knowledge mining, and etc. After carrying out this project, we believe that UDM is still one of hot research topics and there are many to-be-studied research issues including modeling, extensibility, new infrastructure adaptability, e.g. cloud environment, and analytics on big data and so on.
英文关键词: unstructured data management; free-table; adapative storage; myBUD; intelligent search