多种语文一个回答问题的模式,有跨语种的感感感通行证检索 (One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval)

We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open question answering benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.

翻译：我们推出跨语言开放检索问答模式CORA, 它可以回答多种语言的问题, 即使没有语言专用附加说明的数据或知识来源。我们引入了一种新的密集通道检索算法,经过培训可以跨语言检索文件。结合多语言自动递增模式,CORA直接以目标语言回答,而没有先前工作中使用的任何翻译或语言检索模块。我们提议了一种迭代培训方法,将仅以高资源语言提供的附加说明的数据自动扩展到低资源语言。我们的结果表明,CORA大大超越了以前在多语言公开问题上对26种语言进行回答的先进水平,其中9种语言在培训期间是看不见的。我们的分析表明,多语言跨语言检索和生成的重要性,特别是在低资源环境下。