This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments. \url{https://github.com/Nouran-Khallaf/Lexical_Simplification}
翻译:本文介绍了建立现代标准阿拉伯文(MSA)句级简化制度的尝试。我们尝试了采用两种方法的简化刑罚制度:(一) 一种分类方法,导致使用阿拉伯文-BERT这一事先经过培训的背景模型的简化简化法律管道,以及快速字嵌入模式;(二) 一种基因化方法,Seq2Seqeqeq技术,采用多语种文本到文本传输变换器MT5.,我们通过调整国际知名阿拉伯小说“Saaq al-Bambuu”的原句和简化句子来发展我们的训练材料。我们用BERTScore评价指标将生成的简单句子与目标简单句子进行比较,评估这些方法的有效性。MT5模型生成的简单句子通过BERTScore实现P 0.72、R 0.68和F-0.70,而将阿拉伯文-BERT和快文本组合成P 0.97、R 0.97和F-1 0.97。此外,我们报告了这些实验的手工错误分析。