The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propose a self-adversarial learning method and a cascade training method. AccOn an offline human evaluation dataset, we achieve 10% relative improvement in RECALL@20, and for online A/B testing, we achieve 4.1% cart-adds per search (CAPS) and 1.5% gross merchandise value (GMV) improvement. We describe how we train and deploy the embedding based search model and give a detailed analysis of the effectiveness of our method.
翻译:电子商务搜索的关键在于如何最好地利用大型但又吵闹的日志数据。 在本文中, 我们展示了我们嵌入的Instacart 杂货搜索模型。 该系统以双塔变压器编码器结构学习查询和产品表达方式。 要解决冷启动问题, 我们侧重于基于内容的特征。 要高效地将模型培训在吵闹数据上, 我们建议了一种自我对抗学习方法和级联培训方法。 在离线的人类评估数据集上, 我们实现了RECALL@20 10%的相对改进, 在A/B 在线测试中, 我们实现了4.1%的每搜索(CAPS)和1.5%的商品总价值(GMV)提高。 我们描述了我们如何培训和部署基于嵌入的搜索模型,并对我们的方法的有效性进行详细分析。