In the absence of readily available labeled data for a given task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data which may then be used to train supervised systems. Annotation projection has often been formulated as the task of projecting, on parallel corpora, some labels from a source into a target language. In this paper we present T-Projection, a new approach for annotation projection that leverages large pretrained text2text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) The candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) the candidate selection step, in which the candidates are ranked based on translation probabilities. We evaluate our method in three downstream tasks and five different languages. Our results show that T-projection improves the average F1 score of previous methods by more than 8 points.
翻译:在缺乏特定任务和语言的标签数据的情况下,作为自动生成附加说明数据的可能战略之一,提出了说明性预测,以作为自动生成附加说明的数据,然后用于培训受监督的系统; 说明性预测往往被拟订成在平行公司上从一个来源向目标语言投射一些标签的任务; 本文介绍T-Projection,这是一个说明性预测的新方法,利用了大型预先培训的文本文本语言模型和最先进的机器翻译技术; T-Projection将标签预测任务分解成两个子任务:(一) 候选人生成步骤,其中产生一套使用多语种T5模型的预测候选人;(二) 候选人甄选步骤,其中候选人按翻译概率排名; 我们用三种下游任务和五种不同语言评价我们的方法; 我们的结果表明,T-预测将以往方法的平均F1分数提高8个百分点以上。