We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.
翻译:我们介绍MetricoBERT模式,这是一个基于BERT的模型,它学会将文字嵌入一个明确界定的相似度指标之下,同时坚持“传统”的隐蔽语言任务。我们侧重于学习建议相似性的下游任务,我们发现MetricoBERT优于最先进的替代方法,有时还有很大的差幅。我们对我们的方法及其不同的变体进行了广泛的评估,表明我们的培训目标对于传统的对比性损失、标准的共性相似性目标和另外六个基线非常有益。我们作为额外的贡献,出版了一组视频游戏描述数据,以及一组由域专家制作的类似性说明。