在缺少数据的情况下使用多武装土匪算法时,一些绩效考虑因素 (Some performance considerations when using multi-armed bandit algorithms in the presence of missing data)

When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

翻译：在比较多武装土匪算法的性能时,缺失数据的潜在影响往往被忽视。在实践上,它也影响其实施,因为最简单的方法是继续根据原始土匪算法进行抽样,忽略缺失的结果。我们调查这一方法对若干土匪算法处理缺失数据的性能的影响,通过广泛的模拟研究来处理若干土匪算法的缺失数据,假设收益是随机缺失的。我们侧重于双武装土匪算法,在病人分配临床试验时,其二进制结果是比较简单的样本大小。然而,我们的结果也适用于其他土匪算法的应用,其中预计会出现缺失的数据。我们评估由此产生的操作特征,包括预期的奖励。考虑到两件武器失踪的概率不同。我们工作的关键发现是,如果使用最简单的方法,忽略缺失的数据是随机的,则对多武装战略预期绩效的影响会不同,而这些战略平衡勘探-开发交易的得失率。用于勘探的易失率,而用于继续将样本分配给手臂的样本,我们更倾向于使用更隐含的对武器的反向值。