Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data. We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data, both for sentence and word-level prediction.
翻译:质量估计旨在衡量翻译内容的质量,而没有参考翻译。这对于在现实世界需要高质量翻译的情况下的机器翻译系统至关重要。虽然存在许多质量估计方法,但它们基于监督的机器学习,需要昂贵的人类标签数据。作为一种替代办法,我们建议采用一种不依赖人类通知员的例子而是使用合成培训数据的技术。我们培训现成的结构,对我们的合成数据进行监督的质量估计,并表明所产生的模型取得与在判刑和字级预测方面受过附加说明数据培训的模型的类似性能。