The choice of negative examples is important in noise contrastive estimation. Recent works find that hard negatives -- highest-scoring incorrect examples under the model -- are effective in practice, but they are used without a formal justification. We develop analytical tools to understand the role of hard negatives. Specifically, we view the contrastive loss as a biased estimator of the gradient of the cross-entropy loss, and show both theoretically and empirically that setting the negative distribution to be the model distribution results in bias reduction. We also derive a general form of the score function that unifies various architectures used in text retrieval. By combining hard negatives with appropriate score functions, we obtain strong results on the challenging task of zero-shot entity linking.
翻译:选择负面例子在噪音对比性估计中很重要。最近的工作发现,硬负数 -- -- 模型下最明显不正确的例子 -- -- 在实践中是有效的,但实际上却在没有正式理由的情况下加以使用。我们开发了分析工具来理解硬负数的作用。具体地说,我们把对比性损失看作是跨热带损失梯度的偏差估计,从理论上和经验上都表明,将负分布设定为减少偏差的模范分配结果。我们还得出了一种一般的得分函数形式,它统一了在文本检索中使用的各种结构。通过将硬负数与适当的得分函数相结合,我们在零光实体连接这一具有挑战性的任务上取得了强有力的成果。