Using pre-trained transformer models such as BERT has proven to be effective in many NLP tasks. This paper presents our work to fine-tune BERT models for Arabic Word Sense Disambiguation (WSD). We treated the WSD task as a sentence-pair binary classification task. First, we constructed a dataset of labeled Arabic context-gloss pairs (~167k pairs) we extracted from the Arabic Ontology and the large lexicographic database available at Birzeit University. Each pair was labeled as True or False and target words in each context were identified and annotated. Second, we used this dataset for fine-tuning three pre-trained Arabic BERT models. Third, we experimented the use of different supervised signals used to emphasize target words in context. Our experiments achieved promising results (accuracy of 84%) although we used a large set of senses in the experiment.
翻译:使用事先训练过的变压器模型(如BERT)证明在许多NLP任务中是有效的。本文件介绍了我们为阿拉伯单词Sense Disamdiguation (WSD) 微调 BERT 模型的工作。 我们把WSD 任务作为句面二进制分类任务处理。 首先,我们从Birzeit 大学的阿拉伯本体学和大型法律数据库中提取了一组有标签的阿拉伯背景光标对(~167k 配对) 的数据集。 每对都标为真或假,每个背景中的目标单词被识别并附加说明。 其次,我们用这个数据集来微调三个经过预先训练的阿拉伯BERT 模型。 第三,我们实验了使用不同的监督信号来强调目标词的背景。 我们的实验取得了大有希望的结果(84%的精确度),尽管我们在实验中使用了一大套感知觉。