Open-Domain Generative Question Answering has achieved impressive performance in English by combining document-level retrieval with answer generation. These approaches, which we refer to as GenQA, can generate complete sentences, effectively answering both factoid and non-factoid questions. In this paper, we extend GenQA to the multilingual and cross-lingual settings. For this purpose, we first introduce GenTyDiQA, an extension of the TyDiQA dataset with well-formed and complete answers for Arabic, Bengali, English, Japanese, and Russian. Based on GenTyDiQA, we design a cross-lingual generative model that produces full-sentence answers by exploiting passages written in multiple languages, including languages different from the question. Our cross-lingual generative system outperforms answer sentence selection baselines for all 5 languages and monolingual generative pipelines for three out of five languages studied.
翻译:通过将文件级检索与问答生成相结合,公开生成问题解答在英语中取得了令人印象深刻的成绩。这些方法(我们称之为GENQA)可以产生完整的句子,有效地回答事实和非事实问题。在本文中,我们将GENQA扩大到多语种和跨语言环境。为此,我们首先引入GenTyDiQA数据集,扩展TyDiQA数据集,为阿拉伯语、孟加拉语、英语、日语和俄语提供完善和完整的解答。根据GenTyDiQA,我们设计了一种跨语言的拼写模型,通过利用以多种语言(包括问题不同的语言)写成的段落来产生全面解答。我们的跨语言基因化系统超越了所有5种语言的句选基准,为所研究的5种语言中的3种语言建立了单一语谱导管。