Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities among tasks. In this paper, we propose UniTE, which is the first unified framework engaged with abilities to handle all three evaluation tasks. Concretely, we propose monotonic regional attention to control the interaction among input segments, and unified pretraining to better adapt multi-task learning. We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks. Extensive analyses show that our \textit{single model} can universally surpass various state-of-the-art or winner methods across tasks. Both source code and associated models are available at https://github.com/NLP2CT/UniTE.
翻译:翻译质量评价在机器翻译中发挥着关键作用。 根据输入格式,它主要分为三大任务,即:仅参考、仅来源和源参考组合。尽管最近的方法有希望的结果,但还是专门设计和优化了其中之一。这限制了这些方法的方便性,忽视了任务之间的共性。在本文件中,我们提议Unite,这是第一个具有处理所有三项评价任务能力的统一框架。具体地说,我们建议单一的区域关注控制输入部分之间的互动,并统一培训前,以更好地适应多任务学习。我们见证了我们关于WMT 2019Metrics和WMT 2020质量估计基准的框架。广泛的分析表明,我们的Textit{single模型可以普遍超越各种不同任务的最新或成功方法。两种源代码和相关模型都可在https://github.com/NLP2CT/UniTE查阅。