Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds high-quality negatives that are highly competitive with SOTA methods, and requires no additional parameters nor difficult adversarial optimization.
翻译:在使用对比性估计的知识图表中,各实体和关系学习低维的表达方式是用来推断连接模式的一种可扩展和有效的方法。对比性学习方法的一个关键方面是选择产生硬性负样本的腐败分布方式,这迫使嵌入模型学习歧视性表述方式,并找到观察到的数据的关键特征。早期的方法要么采用过于简单的腐败分布方式,即统一,产生容易的不提供信息的负面数据,要么采用具有挑战性的优化办法,在复杂的对抗性分布方式上产生挑战性的优化办法,但是这些方法并没有明确纳入已知的图形结构,导致次优的负数据。在本文件中,我们建议采用“意识到负抽样”(SANS)这一廉价的负面抽样战略,利用丰富的图表结构从节点的k-hop社区选择负面样本。我们很生动地证明,SANSA发现与SOTA方法具有高度竞争力的高质量负面数据,不需要额外的参数或困难的对抗性优化。