Mikolov et al. (2013a) observed that continuous bag-of-words (CBOW) word embeddings tend to underperform Skip-gram (SG) embeddings, and this finding has been reported in subsequent works. We find that these observations are driven not by fundamental differences in their training objectives, but more likely on faulty negative sampling CBOW implementations in popular libraries such as the official implementation, word2vec.c, and Gensim. We show that after correcting a bug in the CBOW gradient update, one can learn CBOW word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks, while being many times faster to train.
翻译:Mikolov等人(2013年a)观察到,连续一袋字嵌入(CBOW)的字嵌入往往表现不佳,而这一发现在随后的著作中已经得到报告,我们发现,这些观察并不是由于培训目标存在根本差异,而是由于在诸如正式实施、Word2vec.c和Gensim等流行图书馆中执行CBOW(CBOW)的错误抽样调查中出现错误差错。我们显示,在纠正CBOW梯度更新中的错误之后,人们可以学习CBOW的字嵌入,这些词嵌入与SG在各种内在和外部任务上具有充分竞争力,同时培训速度要快许多倍。