We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.
翻译:我们介绍了IST和Unbabel对WMT 2022 质量估计共同任务的联合贡献。我们的团队参与了所有三个子任务:(一) 判决和字级质量预测;(二) 可解释的QE;和(三) 关键错误探测。我们在所有任务上都以知识与技术伦理框架为顶端,将它与OpenKiwi的预测者-估计者架构连接起来,并为它配备一个字级序列图具和一个解释解析符。我们的结果表明,在培训前的参考可以改善多个语言对口在下游任务上的绩效,与判决和字级目标的联合培训可以进一步加强。此外,关注和梯度信息被证明是提取对判决质量模型的良好解释的顶尖战略。总体而言,我们提交的文件在几乎所有语言对口的所有三项任务中都取得了相当大的效果。