A combinatorial recommender (CR) system feeds a list of items to a user at a time in the result page, in which the user behavior is affected by both contextual information and items. The CR is formulated as a combinatorial optimization problem with the objective of maximizing the recommendation reward of the whole list. Despite its importance, it is still a challenge to build a practical CR system, due to the efficiency, dynamics, personalization requirement in online environment. In particular, we tear the problem into two sub-problems, list generation and list evaluation. Novel and practical model architectures are designed for these sub-problems aiming at jointly optimizing effectiveness and efficiency. In order to adapt to online case, a bootstrap algorithm forming an actor-critic reinforcement framework is given to explore better recommendation mode in long-term user interaction. Offline and online experiment results demonstrate the efficacy of proposed JDRec framework. JDRec has been applied in online JD recommendation, improving click through rate by 2.6% and synthetical value for the platform by 5.03%. We will publish the large-scale dataset used in this study to contribute to the research community.
翻译:组合建议系统(CR) 将项目列表在结果页的某一时间提供给用户,用户行为受到背景信息和项目的影响。 CR 是一个组合优化问题,目的是最大限度地实现整个列表的建议奖励。尽管它很重要,但由于在线环境中的效率、动态和个性化要求,建立一个实用的CR系统仍然是一个挑战。特别是,我们将问题撕成两个子问题,列表生成和列表评估。为这些子问题设计了新颖和实用的模型结构,目的是共同优化效果和效率。为了适应在线案例,将形成一个形成一个行为者-批评强化框架的靴子算法用于探索长期用户互动的更好建议模式。离线和在线实验结果显示了拟议的JDREC框架的功效。JDREc已被应用于在线JD建议,通过2.6%的速率和5.03 %的合成值来提高平台的点击率。我们将公布本研究中使用的大规模数据集,为研究界做出贡献。