Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.
翻译:单词的含义在同步和对称两方面都很难捕捉到。 在本文中, 我们描述创建了最大的资源, 包括分级背景化的、 异位词, 意指四种不同语言的注解, 其依据是10万个人类语义相近的判断。 我们彻底描述多轮递增批注过程, 选择组合算法, 将使用归为感官, 以及这个数据集的可能- 异位和同步用途 。