Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.
翻译:基于三胞胎损失的远程衡量学习在面部识别、图像检索、语音变换探测和最近与CML模型有关的建议等广泛应用中取得了成功,但是,正如我们在本条中所表明的那样,CML要求大量批量工作合理良好,因为选择三胞胎时采用过于简单化的统一负抽样战略;由于记忆限制,难以在高维情景中进行规模评估。为了缓解这一问题,我们在此建议一个分为两阶段的负抽样战略,发现三胞胎对学习具有高度的启发性。我们的战略允许CML在准确性和普及偏向性方面有效工作,即使批量规模小于默认统一抽样所需的数量。我们证明拟议的建议战略是否适合,并在各个数据集中显示一致的积极结果。