Multiple critical scenarios (like Wikipedia text generation given English Infoboxes) need automated generation of descriptive text in low resource (LR) languages from English fact triples. Previous work has focused on English fact-to-text (F2T) generation. To the best of our knowledge, there has been no previous attempt on cross-lingual alignment or generation for LR languages. Building an effective cross-lingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. We propose two unsupervised methods for cross-lingual alignment. We contribute XALIGN, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on the XAlign dataset.
翻译:多重关键情景(如维基百科文本生成给英文信息框)需要从英文事实三联以低资源语言自动生成描述性文字(LR) 。 先前的工作重点是英语事实对文本的生成。 根据我们所知,以前没有尝试过跨语言对齐或长语言的生成。 建立一个有效的跨语言F2T(XF2T)系统需要将英语结构化事实和长语言句对齐。 我们提出了两种未经监督的跨语言对齐方法。 我们提供了 XALIGN, 一个 XF2T数据集,有0.45M对,横跨8种语言,其中5402对是手动加注的。 我们还在 XAlign 数据集上培训了强大的基线 XF2T 生成模型。