With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-based ads allocation methods is upgraded from point-wise single item to list-wise item arrangement. However, this also results in a high-dimensional space of state-action pairs, making it difficult to learn list-wise representations with good generalization ability. This further hinders the exploration of RL agents and causes poor sample efficiency. To address this problem, we propose a novel RL-based approach for ads allocation which learns better list-wise representations by leveraging task-specific signals on Meituan food delivery platform. Specifically, we propose three different auxiliary tasks based on reconstruction, prediction, and contrastive learning respectively according to prior domain knowledge on ads allocation. We conduct extensive experiments on Meituan food delivery platform to evaluate the effectiveness of the proposed auxiliary tasks. Both offline and online experimental results show that the proposed method can learn better list-wise representations and achieve higher revenue for the platform compared to the state-of-the-art baselines.
翻译:由于最近普遍存在的强化学习(RL),在利用RL在建议平台(例如电子商务和新闻发布网站)上分配广告方面有着巨大的兴趣。为了实现更好的分配,最近基于RL的广告分配方法的投入从点数单项升级到列表式项目安排。然而,这也导致州-行动对口的高维空间,使得很难以良好的概括能力来学习列表式表述,这进一步阻碍了对RL代理物的探索,并导致样本效率低下。为了解决这一问题,我们提出了基于RL的新型广告分配方法,通过利用Meituan食品交付平台上的具体任务信号,学习更好的列表式表述。具体地说,我们提出了三项不同的辅助任务,分别基于重建、预测和对比性学习,分别基于先前的关于分配的域知识。我们对Meituan食品交付平台进行了广泛的实验,以评价拟议的辅助任务的有效性。离线和在线实验结果表明,拟议的方法可以学习更好的列表式表述,并比州基线为平台获取更高的收入。