We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first and largest text generation benchmark with 120k human-annotated multi-way parallel data for three tasks (story generation, question generation, and title generation) across four languages (English, German, French, and Spanish). Based on it, we set various evaluation scenarios and make a deep analysis of several popular multilingual generation models from different aspects. Our benchmark suite will encourage the multilingualism for text generation community with more human-annotated parallel data and more diverse generation scenarios.
翻译:我们引入了MTG(MTG),这是用于培训和评价多语种文本生成的新的基准套件,是第一个也是最大的文本生成基准,有120k个人文附加说明的多路平行数据,用于四种语言(英语、德语、法语和西班牙语)的三种任务(语言生成、问题生成和产权生成)。在此基础上,我们设定了各种评估方案,并从不同方面对几种流行的多语种生成模型进行深入分析。 我们的基准套件将鼓励多语言生成社区多语种,同时提供更多人性附加说明的平行数据和更多样化的版本。