量化合成和合并及其对机器翻译的影响 (Quantifying Synthesis and Fusion and their Impact on Machine Translation)

Theoretical work in morphological typology offers the possibility of measuring morphological diversity on a continuous scale. However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level. We consider Payne (2017)'s approach to classify morphology using two indices: synthesis (e.g. analytic to polysynthetic) and fusion (agglutinative to fusional). For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment level (previous language pairs plus English-German in both directions). We complement the word-level analysis with human evaluation, and overall, we observe a consistent impact of both indexes on machine translation quality.

翻译：形态学学的理论工作提供了持续测量形态多样性的可能性,然而,自然语言处理(NLP)的文献通常将整个语言贴上严格类型的形态学标签,例如混凝土或混凝土。在这项工作中,我们提议减少这种主张的僵硬性,办法是在字和分层一级量化形态学类型学。我们考虑Payne(2017年)采用两种指数对形态学进行分类的方法:合成(如对合成合成的解析)和聚合(对聚合的杂合)。在计算合成方面,我们测试英语、德语和土耳其语的不受监督和监管的形态分解方法,而在聚合方面,我们建议采用一种半自动方法,用西班牙语作为案例研究。然后,我们分析机器翻译质量与语言合成和融合程度之间的关系(英语-土耳其语的词和动词)以及分层(英语-西班牙语的词和动词)和分段一级(英语-西班牙语的预言配方语言加英语和土耳其语分解法)以及整体方向上我们观测单词质量和英语-英语质量分析的对比。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日