Multi armed bandits are one of the theoretical pillars of reinforcement learning. Recently, the investigation of quantum algorithms for multi armed bandit problems was started, and it was found that a quadratic speed-up is possible when the arms and the randomness of the rewards of the arms can be queried in superposition. Here we introduce further bandit models where we only have limited access to the randomness of the rewards, but we can still query the arms in superposition. We show that this impedes any speed-up of quantum algorithms.
翻译:多武装匪徒是强化学习的理论支柱之一。 最近,对多武装匪徒问题量子算法的调查已经开始,并发现当武器与武器奖励随机性在叠加状态中被问及时,就有可能进行二次加速。 在这里我们引入了更多的强盗模式,我们只能有限地获得奖励的随机性,但我们仍然可以询问在叠加状态中的武器。我们证明这阻碍了量子算法的加速。