We present the task of PreQuEL, Pre-(Quality-Estimation) Learning. A PreQuEL system predicts how well a given sentence will be translated, without recourse to the actual translation, thus eschewing unnecessary resource allocation when translation quality is bound to be low. PreQuEL can be defined relative to a given MT system (e.g., some industry service) or generally relative to the state-of-the-art. From a theoretical perspective, PreQuEL places the focus on the source text, tracing properties, possibly linguistic features, that make a sentence harder to machine translate. We develop a baseline model for the task and analyze its performance. We also develop a data augmentation method (from parallel corpora), that improves results substantially. We show that this augmentation method can improve the performance of the Quality-Estimation task as well. We investigate the properties of the input text that our model is sensitive to, by testing it on challenge sets and different languages. We conclude that it is aware of syntactic and semantic distinctions, and correlates and even over-emphasizes the importance of standard NLP features.
翻译:我们介绍PreQuEL(质量-估计前)学习的任务。PreQuEL(质量-估计前)系统预测某一句子的翻译效果如何,而不诉诸实际翻译,从而避免在翻译质量注定会很低时不必要地分配资源。Pre QuEL可以相对于特定的MT系统(例如某些行业服务)或一般与最新工艺相比来界定。从理论角度出发,PreQuEL将重点置于源文本上,追踪使机器难以翻译的句子更难翻译的属性,追踪属性,可能的语言特征。我们为任务开发了一个基线模型并分析其性能。我们还开发了一个数据增强方法(从平行的组合中),大大改进了结果。我们表明,这种增强方法可以改进质量-估计任务的性能,也可以改进我们模型敏感于输入文本的特性,通过测试挑战组合和不同语言。我们的结论是,它了解合成和语义的区别,以及相关甚至过分强调标准N-L特性的重要性。