The goal of the paper is to predict answers to questions given a passage of Qur'an. The answers are always found in the passage, so the task of the model is to predict where an answer starts and where it ends. As the initial data set is rather small for training, we make use of multilingual BERT so that we can augment the training data by using data available for languages other than Arabic. Furthermore, we crawl a large Arabic corpus that is domain specific to religious discourse. Our approach consists of two steps, first we train a BERT model to predict a set of possible answers in a passage. Finally, we use another BERT based model to rank the candidate answers produced by the first BERT model.
翻译:本文的目的是预测对《古兰经》一段中的问题的答案。答案总是在段落中找到,所以模型的任务是预测答案的起点和结束之处。由于初始数据集用于培训的规模相当小,我们使用多种语言的BERT, 以便通过使用阿拉伯语以外其他语言的数据来增加培训数据。此外,我们爬行大量阿拉伯语材料,这是宗教话语所特有的领域。我们的方法包括两个步骤:首先,我们训练BERT模型,在一段中预测一套可能的答案。最后,我们使用另一种基于BERT的模型,对第一个BERT模型产生的候选答案进行排名。