使用语言图表和单词嵌入的游戏代码名 (Playing Codenames with Language Graphs and Word Embeddings)

from arxiv, Divya Koyyalagunta and Anna Sun contributed equally to this work. This is an arXiv version of the paper that has been accepted for publication in the Journal of Artificial Intelligence Research (JAIR)

Although board games and video games have been studied for decades in artificial intelligence research, challenging word games remain relatively unexplored. Word games are not as constrained as games like chess or poker. Instead, word game strategy is defined by the players' understanding of the way words relate to each other. The word game Codenames provides a unique opportunity to investigate common sense understanding of relationships between words, an important open challenge. We propose an algorithm that can generate Codenames clues from the language graph BabelNet or from any of several embedding methods - word2vec, GloVe, fastText or BERT. We introduce a new scoring function that measures the quality of clues, and we propose a weighting term called DETECT that incorporates dictionary-based word representations and document frequency to improve clue selection. We develop BabelNet-Word Selection Framework (BabelNet-WSF) to improve BabelNet clue quality and overcome the computational barriers that previously prevented leveraging language graphs for Codenames. Extensive experiments with human evaluators demonstrate that our proposed innovations yield state-of-the-art performance, with up to 102.8% improvement in precision@2 in some cases. Overall, this work advances the formal study of word games and approaches for common sense language understanding.

翻译：虽然在人工智能研究中已经对棋盘游戏和游戏游戏进行了数十年的研究,但挑战性字游戏仍然相对没有探索。字游戏没有象棋或扑克这样的游戏受到约束。相反,字游戏策略是由玩家对彼此关系的方式的理解来定义的。游戏代码字词提供了一个独特的机会来调查对言词之间关系的常识理解,这是一个重要的公开挑战。我们提出了一个算法,可以从语言图 BabelNet 中或从任何几种嵌入方法 -- -- Word2vec、GloVe、快速Text或BERT中生成代码名线索。我们引入了一个新的评分功能,以测量线索的质量,我们提出了称为DETECT的加权术语,该词包括基于字典的字典表达和文件频率,以改进线索选择。我们开发了 BabelNet-Word选择框架(BabelNet-WSF), 以提高 BabelNet 线索质量,克服先前阻碍将语言图表用于代码的计算障碍。与人类评价员进行广泛的实验,表明我们拟议的创新将产生状态艺术性表现,在常规语言上提升了102.8%的做法。