Finding the appropriate words to convey concepts (i.e., lexical access) is essential for effective communication. Reverse dictionaries fulfill this need by helping individuals to find the word(s) which could relate to a specific concept or idea. To the best of our knowledge, this resource has not been available for the Persian language. In this paper, we compare four different architectures for implementing a Persian reverse dictionary (PREDICT). We evaluate our models using (phrase,word) tuples extracted from the only Persian dictionaries available online, namely Amid, Moein, and Dehkhoda where the phrase describes the word. Given the phrase, a model suggests the most relevant word(s) in terms of the ability to convey the concept. The model is considered to perform well if the correct word is one of its top suggestions. Our experiments show that a model consisting of Long Short-Term Memory (LSTM) units enhanced by an additive attention mechanism is enough to produce suggestions comparable to (or in some cases better than) the word in the original dictionary. The study also reveals that the model sometimes produces the synonyms of the word as its output which led us to introduce a new metric for the evaluation of reverse dictionaries called Synonym Accuracy accounting for the percentage of times the event of producing the word or a synonym of it occurs. The assessment of the best model using this new metric also indicates that at least 62% of the times, it produces an accurate result within the top 100 suggestions.
翻译:查找适当的文字来传达概念( 词汇访问) 是有效沟通的关键。 反词典满足了这一需要, 帮助个人找到可能与特定概念或想法有关的词。 根据我们的知识, 波斯语没有这种资源。 在本文件中, 我们比较了四个不同的结构来实施波斯反向词典( PREDICT ) 。 我们用( 词句、 词) 从唯一可在线查阅的波斯词典( 即 Amid、 Moein 和 Dehkhhoda ) 中提取的词典( 词典描述该词词句的准确性) 满足了这一需要。 根据这个词句, 模型在表达概念的能力方面提出了最相关的词典。 如果正确的词典是其顶级建议之一, 我们的实验显示, 由长短期内记忆( LSTMT) 单位组成的模型足以产生与原始词典中词典中词典相比( 或者在某些情况比词典中更好一些) 的建议。 研究还显示, 模型有时在表达该词的最小的共性词义表达能力上, 其顶级的字典里值, 显示我们开始一个反义里行的顺序的时间 。