XAlign: 跨语言的低资源语言从事实到文字的对齐和生成 (XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages)

Multiple critical scenarios (like Wikipedia text generation given English Infoboxes) need automated generation of descriptive text in low resource (LR) languages from English fact triples. Previous work has focused on English fact-to-text (F2T) generation. To the best of our knowledge, there has been no previous attempt on cross-lingual alignment or generation for LR languages. Building an effective cross-lingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. We propose two unsupervised methods for cross-lingual alignment. We contribute XALIGN, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on the XAlign dataset.

翻译：多重关键情景(如维基百科文本生成给英文信息框)需要从英文事实三联以低资源语言自动生成描述性文字(LR) 。先前的工作重点是英语事实对文本的生成。根据我们所知,以前没有尝试过跨语言对齐或长语言的生成。建立一个有效的跨语言F2T(XF2T)系统需要将英语结构化事实和长语言句对齐。我们提出了两种未经监督的跨语言对齐方法。我们提供了 XALIGN, 一个 XF2T数据集,有0.45M对,横跨8种语言,其中5402对是手动加注的。我们还在 XAlign 数据集上培训了强大的基线 XF2T 生成模型。

相关内容

Automator

关注 0

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日