项目名称: 基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究
项目编号: No.71303120
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 管理科学
项目作者: 王东波
作者单位: 南京农业大学
项目金额: 20万元
中文摘要: 针对目前汉英平行语料标注不深入的现状,本课题基于汉英句法功能知识库和句法功能匹配算法,构建人文社会科学句法级汉英平行语料库,并基于该语料库进行术语和类别知识挖掘的探究。本课题的主要研究内容:基于CSSCI关键词的词性分布倾向性计算汉英关键词的词性;在清华汉语树库和宾州英语树库的基础上,构建汉英句法功能知识库;通过汉英句法树的生成、消歧、优化和错误恢复,构建自动句法分析器;开发句法树辅助校正工具,并完成对汉英句法树的校正;基于句法结构的分布,挖掘术语和类别知识。本课题不仅有助于句法功能匹配理念、句法树构建理论、语言学理论的丰富、建立和研究,而且对知识服务、跨语言检索、语义网和本体、机器翻译等研究具有直接的促进作用。
中文关键词: CSSCI;汉英平行语料库;句法分析器;知识挖掘;
英文摘要: According to the present condition of Chinese-English parallel corpus without deeply tagging, the syntactic level Chinese-English parallel corpus of humanities and social science is constructed based on Chinese and English syntactic function knowledge base and syntactic function matching algorithms, and the researches for the Chinese-English parallel terms and category knowledge mining are taken based on the corpus. The main researches content of the project are as follows: calculating the part of speech of Chinese and English keywords based on the part of speech distribution orientation of keywords from CSSCI;constructing the syntactic function knowledge base based on Tsinghua Chinese treebank and Penn English treebank;constructing the Chinese and English parser based on generation, disambiguation, optimization and error recovery of syntactic trees;designing the tool of auxiliarily correcting the syntactic trees and finishing correcting the syntactic trees;mining the terms and category knowledge based on the distribution of syntactic structures.The project will help the idea of syntactic function matching and the theory of syntactic tree construction and linguistics to enrich,construct and research and directly promote the researches of knowledge service, cross-language information retrieval, semantic web, ont
英文关键词: CSSCI;Chinese-English parallel corpus;parser;knowledge mining;