Contrastive learning has seen increasing success in the fields of computer vision and information retrieval in recent years. This poster is the first work that applies contrastive learning to the task of product matching in e-commerce using product offers from different e-shops. More specifically, we employ a supervised contrastive learning technique to pre-train a Transformer encoder which is afterwards fine-tuned for the matching problem using pair-wise training data. We further propose a source-aware sampling strategy which enables contrastive learning to be applied for use cases in which the training data does not contain product idenifiers. We show that applying supervised contrastive pre-training in combination with source-aware sampling significantly improves the state-of-the art performance on several widely used benchmark datasets: For Abt-Buy, we reach an F1 of 94.29 (+3.24 compared to the previous state-of-the-art), for Amazon-Google 79.28 (+ 3.7). For WDC Computers datasets, we reach improvements between +0.8 and +8.84 F1 depending on the training set size. Further experiments with data augmentation and self-supervised contrastive pre-training show, that the former can be helpful for smaller training sets while the latter leads to a significant decline in performance due to inherent label-noise. We thus conclude that contrastive pre-training has a high potential for product matching use cases in which explicit supervision is available.
翻译:近年来,在计算机视觉和信息检索领域,对比性学习取得了越来越多的成功。这一海报是首次采用对比性学习,利用不同电子商店的产品提供。更具体地说,我们采用了一种监督式对比性学习技术,对一个变换器编码器进行预培训,随后对匹配问题使用双向培训数据进行微调。我们进一步提议了一个源觉抽样战略,使对比性学习能够用于培训数据不含产品放大符的情况。我们显示,在与源觉抽样相结合的情况下,采用监督性对比性培训前的比对式培训极大地改进了几个广泛使用的基准数据集的艺术状态:对于Abt-Buy,我们达到了94.29的F1(+3.24),对亚马逊-戈格勒79.28(+3.7)。对于WDC计算机数据集而言,我们达到了+0.8和+8.84 F1之间的比对比对式培训,这取决于培训设置的大小。 进一步试验与大量内在的升级和自我升级的测试相比,在前的升级中可以显示一个有助的高级的升级,因此,在前的升级前的升级中可以进行有助的升级到最后的升级。