项目名称: Web页面数据对象的感知理解与计算
项目编号: No.61462010
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 朱新华
作者单位: 广西师范大学
项目金额: 45万元
中文摘要: Web页面数据的复杂异构一直是其高效处理的瓶颈,页面分块及信息抽取在一定程度上缓解了这一问题,但未能从根本上解决语义结构化问题。针对这一挑战性的问题,本项目拟以Web页面为研究对象,基于Web页面编撰语言特点,依据页面中信息内容及分布特征,利用统计学和信息科学相关理论和技术,研究Web页面数据对象的语义感知模型与方法,探索语义对象的测度学习理论和技术,构建跨粒度加权语义对象树模型,揭示语义对象到粒对象的关联映射机制,建立多粒度视图的Web页面表示及关联模型。具体研究内容包括:基于LDA模型的Web文本主题感知;基于谱聚类的短文本分类;基于视觉信息和信息内容的语义分块;Web页面语义对象测度学习;跨粒度加权语义对象树构造;Web页面多粒度表示与关联建模等。本项目所建立的Web页面数据感知理解与计算模型和方法,对Web信息集成管理、智能检索与分析挖掘等多个领域具有重要的理论意义和实用前景。
中文关键词: 语义对象;概率主题模型;图谱理论;测度学习;多粒度视图
英文摘要: The complexity and the heterogeneity of Webpage data has always been a bottleneck in its efficient processing. The technique of page blocking and information extracting have been eased the problem in a certain extent, but failed to solve the problem of semantic structuralizing fundamentally . In response to this challenging problem,the project takes the Webpage as the research object, bases on the charcteristics of compilation language of Web page and the distribution characteristics of information in the page, takes use of the related theory and technology of statistics and information science to research the model and method of semantic perception of Webpage data object, explore the measure learning theory and technology of semantic object, build the across granularity weighted semantic object tree model, reveal the mapping mechanism from semantic object to the grain object, meanwhile establish a multi-granularity view of Webpage and its associated model. The mainly research contents include: the perception of the semantic theme of Webpage contents based on the LDA model; short text classification based on spectral clustering; the semantic blocking of Webpage based on visual information and formation content; The measure learning of semantic object in the Webpage; The construction of across the granularity weighted semantic object tree ; Multi-granularity review and its associated modeling, etc. The model and method of perception understanding and calculation established in the project for Webpage data object will have theoretical significance and practical prospects in the management of Web information integration, intelligent retrieval and analysis mining, as well as other fields.
英文关键词: The semantic object;probabilistic topic model;Graph theory;metric learning;Multiple granularity view