We propose a method to easily modify existing offline Recommender Systems to run online using Transfer Learning. Online Learning for Recommender Systems has two main advantages: quality and scale. Like many Machine Learning algorithms in production if not regularly retrained will suffer from Concept Drift. A policy that is updated frequently online can adapt to drift faster than a batch system. This is especially true for user-interaction systems like recommenders where the underlying distribution can shift drastically to follow user behaviour. As a platform grows rapidly like Grubhub, the cost of running batch training jobs becomes material. A shift from stateless batch learning offline to stateful incremental learning online can recover, for example, at Grubhub, up to a 45x cost savings and a +20% metrics increase. There are a few challenges to overcome with the transition to online stateful learning, namely convergence, non-stationary embeddings and off-policy evaluation, which we explore from our experiences running this system in production.
翻译:我们建议一种方法,方便地修改现有的离线建议系统,以便利用转移学习在线运行。 在线为建议系统学习有两个主要优势: 质量和规模。 与许多生产中的机器学习算法一样,如果没有定期再培训,也会受到“ 驱动器” 概念的影响。 经常在线更新的政策可以适应,以更快的速度漂移。 对于用户互动系统来说尤其如此,比如建议者,其基本分布可以急剧变化,以跟踪用户的行为。 平台的迅速发展,像Grubhub一样,运行批次培训工作的成本变得重要。 从无国籍的批次在线学习转向状态式的在线渐进学习可以恢复,比如在Grubhub, 最多可节约45x成本,增加+20%的尺度。 在向在线状态学习过渡的过程中,需要克服一些挑战,即趋同、非静止嵌入和脱离政策评价,我们从生产中运行这一系统的经验中探索。