面向领域本体的多源异构数据聚合和语义标注关键技术研究

项目名称： 面向领域本体的多源异构数据聚合和语义标注关键技术研究

项目编号： No.61272015

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 张瑞玲

作者单位： 洛阳师范学院

项目金额： 61万元

中文摘要： 项目从语义推理在电子商务中的应用出发，对电子商务领域本体构建、多源异构数据聚合、语义标注方法等关键技术进行研究，为了增强系统的鲁棒性和抗噪能力，运用粗糙集、形式概念分析和模糊集理论，以UNSPC为核心本体，建立电子商务领域本体；为降低时间复杂度，提高本体合并准确率，提出粗糙概念格同构生成的本体合并方法；设计多源异构数据聚合框架，为提高数据标注速度和准确率，借助网络爬虫和数理统计工具，获取领域高频词汇表，通过查询高频词汇表标注Deep Web数据，实现异构数据聚合；引入梯形模糊数表示模糊相似度，通过加权综合、非模糊化，计算概念间的复合相似度，提高映射和匹配效率；为降低噪声提升页面获取速度，提出基于本体的主题爬虫页面筛选算法；设计语义标注框架，抓取网页时，为平衡高空间效率和误判率，提出利用分块哈希函数法进行URLs去重，为克服分类数据局限性，提高分类效率，提出基于多核多分类支持向量机的分类算

中文关键词： 粗概念格；领域本体；本体映射；概念相似度；语义标注

英文摘要： The project probes into building、multi-source heterogeneous data aggregation and semantic annotation of e-business domain ontology in order to the application of semantic reasoning in e-commerce. The domain ontology model of e-business is built by integrating of rough set, formal concept analysis and the theory of fuzzy sets, combined with original ontology model of the UNSPSC (United Nations Standard Products and Services Classification Code) by way of core ontology in order to enhance system robustness and antinoise. To reduce the time complexity and improve the accuracy and efficiency of ontology merging, the method of isomorphic generating of ontology merging based on rough concept lattices is presented. The framework of multi-source heterogeneous data aggregation is designed. The tables of domain high-frequency vocabulary are got by web crawler and mathematical statistics tool. The databases of deep web are annotated by high-frequency vocabulary table query in order to implementing heterogeneous data aggregation and improve the speed and accuracy of data annotation. The composite similarity among ontology concept is computing by introducing of trapezoidal fuzzy number to represent the fuzzy similarity and via weighted composite, non-fuzzy for the sake of improving the effect of mapping and matching. The fil

英文关键词： rough concept lattice；domain ontology；ontology mapping；concept similarity；semantic annotation

成为VIP会员查看完整内容