In recent years, we witnessed great progress in different tasks of natural language understanding using machine learning. Question answering is one of these tasks which is used by search engines and social media platforms for improved user experience. Arabic is the language of the Holy Qur'an; the sacred text for 1.8 billion people across the world. Arabic is a challenging language for Natural Language Processing (NLP) due to its complex structures. In this article, we describe our attempts at OSACT5 Qur'an QA 2022 Shared Task, which is a question answering challenge on the Holy Qur'an in Arabic. We propose an ensemble learning model based on Arabic variants of BERT models. In addition, we perform post-processing to enhance the model predictions. Our system achieves a Partial Reciprocal Rank (pRR) score of 56.6% on the official test set.
翻译:近年来,我们在利用机器学习的自然语言理解的不同任务方面取得了巨大进展。 回答问题是搜索引擎和社交媒体平台用来改善用户经验的任务之一。 阿拉伯语是《古兰经》的语言,是全世界18亿人的神圣文字。 阿拉伯语因其结构复杂,是自然语言处理的一种具有挑战性的语言。 在文章中,我们描述了我们尝试进行OSACT5 Qur'an QA 2022 共享任务,这是阿拉伯《古兰经》上一个回答问题的问题。 我们提议了一个基于阿拉伯语变异的BERT模型的混合学习模式。 此外,我们还进行了后处理,以加强模型预测。我们的系统在正式测试集中取得了56.6%的部分对等分。