Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.
翻译:嵌入模式的字词会学习字词的精度丰富的矢量表达,并被广泛用于初始化自然处理语言(NLP)模型。流行的Word2vec(CBOW)模式(CBOW)通过在句子中隐藏一个单词来学习一个矢量嵌入,然后用其他词作为上下文来预测它。 CBOW的局限性在于它在作出预测时对上下文单词同样权重,因为有些字词的预测值高于其他词。我们通过引入将注意力机制纳入CBOW模式的注意字嵌入模式(AWE)模型来解决这一效率低下问题。我们还提出了AWE-S(包含子字信息)建议。我们表明,AWE和AWE-S(AWE-S)在各种类似词的数据集上,以及在用于初始化NLP模型时,都超越了最先进的嵌入模式。