E-commerce websites use machine learned ranking models to serve shopping results to customers. Typically, the websites log the customer search events, which include the query entered and the resulting engagement with the shopping results, such as clicks and purchases. Each customer search event serves as input training data for the models, and the individual customer engagement serves as a signal for customer preference. So a purchased shopping result, for example, is perceived to be more important than one that is not. However, new or under-impressed products do not have enough customer engagement signals and end up at a disadvantage when being ranked alongside popular products. In this paper, we propose a novel method for data curation that aggregates all customer engagements within a day for the same query to use as input training data. This aggregated customer engagement gives the models a complete picture of the relative importance of shopping results. Training models on this aggregated data leads to less reliance on behavioral features. This helps mitigate the cold start problem and boosted relevant new products to top search results. In this paper, we present the offline and online analysis and results comparing the individual and aggregated customer engagement models trained on e-commerce data.
翻译:电子商务网站通常使用机器学习的排名模型向客户提供购物结果。 通常, 网站记录客户搜索事件, 包括输入的查询和由此而来的与购物结果的接触, 如点击和购买。 每个客户搜索活动都作为模型的输入培训数据, 个人客户参与作为客户偏好的信号。 因此, 购买的购物结果被认为比没有购买的更重要。 但是, 新的或受压不足的产品没有足够的客户参与信号, 当与流行产品一起排名时, 最终处于劣势。 在本文中, 我们提出了一个新的数据整理方法, 将所有客户参与活动在一天之内汇总起来, 以用作输入培训数据。 这种合并的客户参与使模型完整地描绘了购物结果的相对重要性。 这种综合数据的培训模式导致对行为特征的依赖程度减少。 这有助于缓解冷开始的问题,并将相关的新产品提升到顶级搜索结果。 在本文中, 我们介绍了对所培训的个人和综合客户参与模式进行离线和在线分析和比较的结果。