This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE). The dataset and a demo page are publicly available at https://portal.sina.birzeit.edu/synonyms.
翻译:本文涉及以额外的同义词来扩展给定的同义词, 并附加同义词, 同时考虑到同义词强度作为模糊值 。 如果使用单词/ 多语种的同义词和阈值( 模糊值[ 0-1 ), 我们的目标是从现有的词汇中提取高于此阈值的新的同义词 。 我们提出双重贡献: 一个算法和一个基准数据集。 数据集由 3K 候选人的500 个同义词组成 。 每个候选人的同义词由 4 个语言学家以模糊值附加说明 。 该数据集对于 (一) 理解语言学家( di/ gree) 在同义词学上有多长( dis/ gree) 很重要, 除了 (二) 使用数据组作为基准来评估我们的算法。 我们提议的算法从现有的同义词组中提取同义词, 并为每个候选人配置一个模糊值。 我们的评估显示, 算法的行为方式表现得像语言学家一样, 其模糊值接近语言学家( 使用 RMSE 和 MAs plus) 。 和 commsetims 。 a pages