Using low dimensional vector space to represent words has been very effective in many NLP tasks. However, it doesn't work well when faced with the problem of rare and unseen words. In this paper, we propose to leverage the knowledge in semantic dictionary in combination with some morphological information to build an enhanced vector space. We get an improvement of 2.3% over the state-of-the-art Heidel Time system in temporal expression recognition, and obtain a large gain in other name entity recognition (NER) tasks. The semantic dictionary Hownet alone also shows promising results in computing lexical similarity.
翻译:使用低维矢量空间来代表文字在许多 NLP 任务中非常有效。 但是, 当面临稀有和看不见的字的问题时, 使用低维矢量空间来代表文字效果不佳 。 在本文中, 我们提议利用语义词典中的知识, 结合一些形态学信息来构建一个增强的矢量空间 。 在时间表达识别方面, 我们比最先进的海德尔时间系统提高了2. 3%, 在其它名称实体识别( NER) 任务中也获得了很大的收益 。 语义词典Hownet 本身也显示了计算词汇相似性方面有希望的结果 。