We propose Streaming Bandits, a Restless Multi Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day. Our contributions are as follows: (1) We derive conditions under which our problem satisfies indexability, a precondition that guarantees the existence and asymptotic optimality of the Whittle Index solution for RMABs. We establish the conditions using a polytime reduction of the Streaming Bandit setup to regular RMABs. (2) We further prove a phenomenon that we call index decay, whereby the Whittle index values are low for short residual lifetimes driving the intuition underpinning our algorithm. (3) We propose a novel and efficient algorithm to compute the index-based solution for Streaming Bandits. Unlike previous methods, our algorithm does not rely on solving the costly finite horizon problem on each arm of the RMAB, thereby lowering the computational complexity compared to existing methods. (4) Finally, we evaluate our approach via simulations run on realworld data sets from a tuberculosis patient monitoring task and an intervention planning task for improving maternal healthcare, in addition to other synthetic domains. Across the board, our algorithm achieves a 2-orders-of-magnitude speed-up over existing methods while maintaining the same solution quality.
翻译:我们提出“不懈行匪”框架,即“不懈多武装强盗”框架,混合武器可在其中到达并离开系统,允许混合武器在固定的一生中停留一段时间。“挥手强盗”自然捕捉到健康干预规划问题,卫生工作者必须管理病人组群的保健结果,而新的病人会加入,现有病人每天离开该组群。我们的贡献如下:(1) 我们提出问题能满足指数性的条件,这是保证RMAB公司Whittle指数解决方案的存在和无休止最佳的前提条件。我们用固定的RMAB公司设置的分流猛匪组合多时减少系统后,来创造条件。 (2) 我们还进一步证明一种我们称之为指数衰减的现象,即惠特尔指数值在短的余生群中较低,驱动着我们算法的基础。(3) 我们提出了一种新颖而有效的算法,用以计算基于指数的强盗的强盗解决方案。与以前的方法不同,我们的算法并不依赖于RMAB公司每个臂的昂贵的有限地平线问题,从而降低计算复杂性,从而降低固定式的硬行盗结构设置常规和常规保健方法。(2)我们从实际的计算方法,然后通过模拟地算方法来改进我们的任务。