Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.
翻译:Whittle 指数政策是获得对臭名昭著的无动于衷的土匪问题无异的最佳最佳解决办法的有力工具。然而,找到Whittle指数对于许多实际的无动于衷的土匪来说仍然是一个棘手的问题。本文提议了NeurWIN,这是一个神经惠特尔指数网络,通过利用Whittle指数的数学特性,为任何无动于衷的土匪学习Whittle指数。我们显示,生成Whittle指数的神经网络也是对一组Markov决策问题产生最佳控制的神经网络。这一属性利用深入强化学习对NeurWIN进行训练的动力。我们通过评估NeurWINWIN对最近研究的3个无动于衷的土匪问题的表现,展示了NeurWIN的效用。我们的实验结果表明,NeurWIN的性能比其他RL算法要好得多。