In this paper, we introduce the Eval4NLP-2021shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.
翻译:在本文中,我们介绍了Eval4NLP-2021关于可解释质量估算的共享任务。根据对来源翻译的配对,这一共同任务不仅需要提供一个判决级评分,表明翻译的总体质量,而且还需要通过确定对翻译质量有负面影响的词来解释这一评分。我们介绍了对共同任务的数据、批注指南和评价设置,描述了六个参与系统,并分析了结果。据我们所知,这是关于可解释的NLP评价指标的第一项共同任务。数据集和结果可在https://github.com/eval4np/SharedTask2021上查阅。