We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting.
翻译:我们认为,在随机多武装土匪框架中,最好的手臂识别问题就是每个手臂都极有可能获得大奖赏,而奖赏却极有可能是零。这一框架的一个关键应用是在网上广告中,点击广告的速率可能只有一个百分点,最终转换成销售,虽然利润很高,但也可能是点击率的一小部分。最近,BAI问题的算法已经发展,在为正确的手臂选择提供统计保证的同时,将样本复杂性降到最低。正如我们所观察的那样,这些算法在计算上可能令人望而却步。我们利用每个手臂的奖赏过程被复合Poisson过程非常接近,以更快的速度达成算法,而抽样复杂性则小一些。我们从一个微调制度中分析问题,因为奖赏的罕见性降低到零,而奖赏则增加至无限性。这有助于说明拟议算法的好处。这也有助于说明拟议的算法的优点。它也说明了稀少事件环境中最佳的BAI算法的基本结构。</s>