We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at \url{https://github.com/zide05/MTG}.
翻译:我们引入了MTG,这是培训和评价多语种文本生成的新基准套件,这是第一个提出多语言多语言文本生成数据集,具有最大的人文附加说明数据(400k),其中包括五种语言(英文、德文、法文、西班牙文和中文)的四种生成任务(代号、问题生成、产权生成和文本汇总);多道路设置能够测试跨语言和任务模式的知识转让能力;利用MTG,我们从不同方面培训和分析几种流行的多语言生成模型;我们的基准套件用更多人文附加说明的平行数据促进模型性能增强;它提供不同代数情景的综合评估;代码和数据可在以下网站查阅:https://github.com/zide05/MTG}。