Deep learning-based recommendation has become a widely adopted technique in various online applications. Typically, a deployed model undergoes frequent re-training to capture users' dynamic behaviors from newly collected interaction logs. However, the current model training process only acquires users' feedbacks as labels, but fail to take into account the errors made in previous recommendations. Inspired by the intuition that humans usually reflect and learn from mistakes, in this paper, we attempt to build a self-correction learning loop (dubbed ReLoop) for recommender systems. In particular, a new customized loss is employed to encourage every new model version to reduce prediction errors over the previous model version during training. Our ReLoop learning framework enables a continual self-correction process in the long run and thus is expected to obtain better performance over existing training strategies. Both offline experiments and an online A/B test have been conducted to validate the effectiveness of ReLoop.
翻译:深层学习建议已成为各种在线应用中广泛采用的一种技术。 通常,部署的模型经常进行再培训,以从新收集的互动日志中捕捉用户动态行为。 然而,目前的模型培训过程只获得用户的反馈,作为标签,但并未考虑到先前建议中的错误。 在人类通常反映和从错误中吸取教训的直觉的启发下,我们试图在本文件中为推荐者系统建立一个自我校正学习循环(下沉 ReLooop ) 。 特别是,使用新的定制损失来鼓励每一个新模式版本在培训期间减少前一个模型的预测错误。 我们的ReLoop学习框架能够长期地持续进行自我纠正过程,因此预期能够比现有的培训战略取得更好的业绩。 为了验证ReLoop的有效性,我们进行了脱机实验和在线A/B测试。