This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.
翻译:高效的负采样之外的分布式表示方法
本文介绍了一种学习分布式表示方法(也称为嵌入)的高效方法。我们通过最小化类似于Word2Vec算法中引入的色目函数来实现。在此优化计算中,计算softmax归一化常数是一个计算瓶颈,需要执行许多操作,其规模与样本量的平方成正比。此复杂性不适用于大型数据集,负采样是一个流行的解决方法,使得我们可以以与样本大小线性时间成比例的速度获得分布式表示。然而,负采样是通过改变损失函数来解决不同的优化问题。我们的贡献是表明softmax归一化常数可以在线性时间内估计,从而允许我们设计一种高效的优化策略来学习分布式表示方法。我们在与单词和节点嵌入相关的两个流行应用程序上测试了我们的近似方法。结果显示,与负采样相比,我们的方法在准确性方面具有相当的竞争性,且计算时间显着更短。