Extracting synonyms from dictionaries or corpora is gaining special attention as synonyms play an important role in improving NLP application performance. This paper presents a survey of the different approaches and trends used in automatically extracting the synonyms. These approaches can be divided into four main categories. The first approach is to find the Synonyms using a translation graph. The second approach is to discover new transition pairs such as (Arabic-English) (English-France) then (Arabic-France). The third approach is to construct new WordNets by exploring synonymy graphs, and the fourth approach is to find similar words from corpora using Deep Learning methods, such as word embeddings and recently BERT models. The paper also presents a comparative analysis between these approaches and highlights potential adaptation to generate synonyms automatically in the Arabic language as future work.
翻译:从词典或公司提取同义词正在引起特别注意,因为同义词在改进NLP应用性能方面起着重要作用。本文件对自动提取同义词所使用的不同方法和趋势进行了调查。这些方法可以分为四大类。第一种方法是使用翻译图找到同义词。第二种办法是发现新的过渡配对,例如(阿拉伯文-英文)(英文-法国),然后(阿拉伯文-法国)。第三个办法是通过探索同义词图来建造新的WordNet,而第四种办法是利用深学习方法从Corbora找到类似词,例如词嵌入式和最近的BERT模型。本文还对这些方法进行比较分析,并突出可能作出的调整,以便在阿拉伯文中自动产生同义词,作为未来的工作。