In online advertising, recommender systems try to propose items from a list of products to potential customers according to their interests. Such systems have been increasingly deployed in E-commerce due to the rapid growth of information technology and availability of large datasets. The ever-increasing progress in the field of artificial intelligence has provided powerful tools for dealing with such real-life problems. Deep reinforcement learning (RL) that deploys deep neural networks as universal function approximators can be viewed as a valid approach for design and implementation of recommender systems. This paper provides a comparative study between value-based and policy-based deep RL algorithms for designing recommender systems for online advertising. The RecoGym environment is adopted for training these RL-based recommender systems, where the long short term memory (LSTM) is deployed to build value and policy networks in these two approaches, respectively. LSTM is used to take account of the key role that order plays in the sequence of item observations by users. The designed recommender systems aim at maximising the click-through rate (CTR) for the recommended items. Finally, guidelines are provided for choosing proper RL algorithms for different scenarios that the recommender system is expected to handle.
翻译:在网上广告中,建议系统试图根据潜在客户的利益,从产品清单中向他们提出项目;由于信息技术的迅速增长和大型数据集的可用性,这种系统越来越多地用于电子商务;人工智能领域不断进步,为处理此类实际生活问题提供了强有力的工具;深度强化学习(RL),将深神经网络作为通用功能近似器部署为通用功能,可被视为设计和实施推荐系统的有效方法;本文件对基于价值的深RL算法和政策为基础的深RL算法进行了比较研究,用于设计网上广告建议系统。采用了REcoGym环境来培训这些基于RL的推荐系统,在这两种方法中分别部署了长期内存(LSTM)来建立价值和政策网络。LSTM用于考虑在用户项目观测顺序中顺序所起的关键作用。设计的建议系统旨在为建议的项目最大限度地使用点击率(CTRer)。最后,为选择适当的RL算法系统提供了指南,以建议采用不同情景处理的正确RL算法。