项目名称: 面向机器翻译的多词表达语义分析及应用研究
项目编号: No.61473294
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 陈钰枫
作者单位: 北京交通大学
项目金额: 83万元
中文摘要: 多词表达是自然语言中一类固定或半固定搭配的语言单元。其语义表示、有效抽取及正确翻译是自然语言处理中的难点。尤其针对汉语多词表达的抽取和汉英多词表达的语义结构差异分析的研究,近年来未有明显进展,已成为信息抽取和机器翻译等领域的瓶颈问题之一。因此,本项目将在以下方面进行深入研究和探索:(1)充分借鉴和利用已有的语言学资源,提出基于词向量表示的多词表达语义理解方法;(2)在此基础上,提出基于语义理解的汉语/汉英双语多词表达抽取模型,在联合推断框架下,使得多词表达抽取与分词、句法分析以及词对齐过程相辅相成;(3)构建基于语义计算的多词表达挖掘框架,从海量网络资源中挖掘新词,并抽取和过滤出可靠的多词表达翻译对;(4)最终面向机器翻译的应用,提出融合多词表达语义知识的机器翻译框架,从两个层次引入多词表达的语义知识来辅助翻译系统性能的提高。本项目开展的研究工作具有重要的理论意义和应用价值。
中文关键词: 自然语言处理;机器翻译;多词表达;语义分析;中文信息处理
英文摘要: Multiword expressions (MWEs) are idiomatic expressions with fixed or semifixed collocation in natural language, of which the semantic interpretation, effective extraction, and precise translation are difficulties in natural language processing. Especially the research about the extraction of Chinese MWE and the diversity between Chinese and English MWEs has been a major concern in information extraction and machine translation areas. Therefore, the proposed project would conduct the following research. First, based on available language resource, we plan to interpret latent semantic information of MWEs by word embeddings. Second, we propose a semantic-based Chinese and English bilingual MWE extraction method, which could give feedback to word segment,parsing and word alignment and improve overall performance. Third, we present a Web data mining framework for MWEs, which could discover new MWEs and select reliable MWE translations from the Web. And finally, we construct a MWEs-based translation system, which integrates the semantic information of MWEs to improve the translation performance. In summary, the research work carried out has important theoretical significance and application value.
英文关键词: natural language processing;machine translation;multiword expression;semantic analysis;Chinese information processing