The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. To aid sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.
翻译:语言模式的有效性受到其象征性表述的影响,它必须将背景信息编码,并处理具有多种含义(polysemy)的同一词形式。目前,没有一个通用语言建模架构明确模型多元性。我们提出了一个语言模型,不仅预测下一个词,而且从上下文看它的意义。我们争辩说,这种较高的预测颗粒性对于诸如辅助性书写等最终任务可能有用,并能够将语言模型与知识基础更精确地联系起来。我们发现,多思维语言建模需要超越标准语言模型的架构,在此提出一个结构化的预测框架,将任务分解成一个单词,然后进行感知性预测任务。为了帮助人们进行感知预测,我们使用图形注意网络,将定义和词感学的示例使用编码。总体而言,我们发现多思维语言建模是一个极具挑战性的任务,我们建议今后的工作侧重于创建更具注释性的培训数据集。