The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and to set performance baselines for several recent contextualized multilingual models. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages. XL-WiC is available at https://pilehvar.github.io/xlwic/.
翻译:正确模拟单词不同含义的能力对于语义表述技术的有效性至关重要。然而,评估这一标准的多数现有评价基准都与感知清单(通常是WordNet)挂钩,将其使用限制在基于知识的一组技术上。Wod-in-context数据集(WIC)通过将标准的模糊性任务重新确定为二进制分类问题来解决对感知清单的依赖问题;但是,它仅限于英语。我们提出了一个庞大的多语种基准XL-Wic,它以来自不同语言的12种新语言和不同程度的资源可用性体现黄金标准,为零弹射跨语言传输等评价情景开辟空间。我们进行了一系列实验,以确定数据集的可靠性,并为最近一些背景化的多语种模型设定性基准。实验结果表明,即使目标语言没有标注实例,仅接受英语数据培训的模型也能在区分一个词的不同含义、甚至远程语言的不同含义方面实现竞争性业绩。 XL-Wic在https://pilehvar.github.xwic上可以查阅 https://plievar.giu/xwic。