Question answering(QA) is one of the most challenging yet widely investigated problems in Natural Language Processing (NLP). Question-answering (QA) systems try to produce answers for given questions. These answers can be generated from unstructured or structured text. Hence, QA is considered an important research area that can be used in evaluating text understanding systems. A large volume of QA studies was devoted to the English language, investigating the most advanced techniques and achieving state-of-the-art results. However, research efforts in the Arabic question-answering progress at a considerably slower pace due to the scarcity of research efforts in Arabic QA and the lack of large benchmark datasets. Recently many pre-trained language models provided high performance in many Arabic NLP problems. In this work, we evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets which are Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model. In the last, we provide an analysis to understand and interpret the low-performance results obtained by some models.
翻译:问题解答(QA)是自然语言处理(NLP)中最具挑战性但又广泛调查的问题之一。 问题解答(QA)系统试图为特定问题提供答案,这些答案来自非结构化或结构化文本。因此,质量解答被认为是一个重要的研究领域,可用于评价文本理解系统。大量质量解答研究专用于英语,调查最先进的技术和取得最先进的结果。然而,由于阿拉伯文质量解答(QA)的研究工作很少,而且缺乏大量基准数据集,因此,在阿拉伯语问题解答(QA)系统上的研究努力进展速度要慢得多。最近许多经过事先培训的语言模型在许多阿拉伯文国家语言解答问题上表现优异。在这项工作中,我们利用四种理解解析数据集,即阿拉伯语QAD、ARCD、AQAQA和TydiQA-GoldP数据集,我们调整并比较了Ara-TRA的低性能解说模型,ATRA2-RVL 和低性能解说模型。