Motivated by real-world applications such as fast fashion retailing and online advertising, the Multinomial Logit Bandit (MNL-bandit) is a popular model in online learning and operations research, and has attracted much attention in the past decade. However, it is a bit surprising that pure exploration, a basic problem in bandit theory, has not been well studied in MNL-bandit so far. In this paper we give efficient algorithms for pure exploration in MNL-bandit. Our algorithms achieve instance-sensitive pull complexities. We also complement the upper bounds by an almost matching lower bound.
翻译:在快速时装零售和在线广告等现实世界应用的推动下,多式逻辑盗匪(MNL-bandit)是在线学习和操作研究中流行的模式,在过去十年中引起了人们的极大关注。然而,令人感到有点惊讶的是,迄今为止,在MNL-bandit没有很好地研究纯粹的探索(土匪理论中的一个基本问题)问题。在本文中,我们给出了在MNL-bandit进行纯探索的有效算法。我们的算法达到了对实例敏感的复杂程度。我们还以几乎匹配的更低的界限来补充上层界限。