Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules constraints that practitioners at Booking.com encounter when applying bandit algorithms. Our main contributions is an extension to the Open Bandit Pipeline (OBP) framework. We provide simulation components for some of the above-mentioned challenges to provide future practitioners, researchers, and educators with a resource to address challenges encountered in the e-commerce industry.
翻译:电子商业行业经常使用土匪算法来培训没有预贴标签的数据的机器学习系统;然而,行业环境带来了各种挑战,使得在实践中实施土匪算法成为非三重性的做法。我们在本文件中详细阐述了非政策优化、延迟奖励、概念漂移、奖赏设计和商业规则的挑战,即Booking.com的从业人员在应用土匪算法时遇到的问题。我们的主要贡献是对开放土匪管道框架(OBP)的延伸。我们为上述一些挑战提供了模拟组成部分,以便为未来的从业人员、研究人员和教育者提供应对电子商务行业所遇挑战的资源。