Sentence embedding methods using natural language inference (NLI) datasets have been successfully applied to various tasks. However, these methods are only available for limited languages due to relying heavily on the large NLI datasets. In this paper, we propose DefSent, a sentence embedding method that uses definition sentences from a word dictionary, which performs comparably on unsupervised semantics textual similarity (STS) tasks and slightly better on SentEval tasks than conventional methods. Since dictionaries are available for many languages, DefSent is more broadly applicable than methods using NLI datasets without constructing additional datasets. We demonstrate that DefSent performs comparably on unsupervised semantics textual similarity (STS) tasks and slightly better on SentEval tasks to the methods using large NLI datasets. Our code is publicly available at https://github.com/hpprc/defsent .
翻译:使用自然语言推断( NLI) 数据集的句子嵌入方法已成功应用于各种任务。 但是,由于严重依赖大型 NLI 数据集,这些方法只能用于有限的语言。 在本文中,我们提议了DefSent, 这是一种从单词字典中使用定义句子的句子嵌入方法,该词词词典对未经监督的语义相似性(STS)任务具有可比性,对SentEval任务的文字相似性(STS)比常规方法要好一些。由于许多语言都有词典,DefSent比使用 NLI 数据集的方法适用得更广泛,而没有建立额外的数据集。我们证明DefSent在未经监督的语义相似性文本相似性(STS)任务上表现得比使用大NLI 数据集的方法要好一些。我们的代码在https://github.com/hpprc/defsent上公布。