Restless and collapsing bandits are often used to model budget-constrained resource allocation in settings where arms have action-dependent transition probabilities, such as the allocation of health interventions among patients. However, state-of-the-art Whittle-index-based approaches to this planning problem either do not consider fairness among arms, or incentivize fairness without guaranteeing it. We thus introduce ProbFair, a probabilistically fair policy that maximizes total expected reward and satisfies the budget constraint while ensuring a strictly positive lower bound on the probability of being pulled at each timestep. We evaluate our algorithm on a real-world application, where interventions support continuous positive airway pressure (CPAP) therapy adherence among patients, as well as on a broader class of synthetic transition matrices. We find that ProbFair preserves utility while providing fairness guarantees.
翻译:无休止和崩溃的匪徒往往被用来在武器具有依赖行动的过渡可能性的环境中模拟受预算限制的资源分配,如在病人中分配保健干预措施,然而,最先进的惠特尔指数办法处理这一规划问题,要么不考虑武器之间的公平,要么在不保证的情况下鼓励公平。 因此,我们引入了普罗伯法,这是一种可行的公平政策,最大限度地实现预期报酬总额并满足预算限制,同时确保严格减少在每一时间步骤中被拉动的可能性。我们评估了现实世界应用的算法,在现实世界应用中,干预措施支持病人持续接受积极空气压力(CPAP)治疗,以及更广泛的合成过渡矩阵。我们发现普罗伯法在提供公平保障的同时维护了效用。