Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer. In this project, we take a step back and study which approaches allow us to take the most advantage of existing resources in order to produce QA systems in many languages. Specifically, we perform extensive analysis to measure the efficacy of few-shot approaches augmented with automatic translations and permutations of context-question-answer pairs. In addition, we make suggestions for future dataset development efforts that make better use of a fixed annotation budget, with a goal of increasing the language coverage of QA datasets and systems. Code and data for reproducing our experiments are available here: https://github.com/NavidRajabi/EMQA.
翻译:以英语回答问题(QA)已得到广泛探讨,但多语种数据集相对而言比较新,有几种方法试图通过翻译和跨语言传输来弥补高、低资源语言之间的差距,在这个项目中,我们退一步,研究哪些方法能最充分地利用现有资源,以多种语文生成QA系统,具体而言,我们进行了广泛的分析,以衡量通过自动翻译和修改背景问题解答对口而增强的微小方法的功效。此外,我们建议今后开展数据集开发工作,更好地利用固定的说明预算,目标是扩大QA数据集和系统的语文覆盖面。