We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the problem as a discrete time Markov decision process (MDP). We propose a new algorithm, SAFFE, that makes fair allocations with respect to the entire demands revealed over the horizon by accounting for expected future demands at each arrival time. The algorithm introduces regularization which enables the prioritization of current revealed demands over future potential demands depending on the uncertainty in agents' future demands. Using the MDP formulation, we show that SAFFE optimizes allocations based on an upper bound on the Nash Social Welfare fairness objective, and we bound its gap to optimality with the use of concentration bounds on total future demands. Using synthetic and real data, we compare the performance of SAFFE against existing approaches and a reinforcement learning policy trained on the MDP. We show that SAFFE leads to more fair and efficient allocations and achieves close-to-optimal performance in settings with dense arrivals.
翻译:我们研究将有限资源分配给在一定的地平线上到达时显示其随机需求的代理商的顺序决策问题。我们的目标是设计公平的分配算法,以用尽现有的资源预算。这在决策时没有关于未来需求的信息的顺序环境中具有挑战性。我们将这一问题作为一个离散的时间马尔科夫决策过程(MDP)来表述。我们提出了一个新的算法,即SAFE,通过计算每次到达时的预期未来需求,对在地平线上显示的全部需求进行公平的分配。这一算法引入了正规化,使得根据代理人未来需求的不确定性,能够将目前披露的需求相对于未来潜在需求进行优先排序。我们利用MDP的提法,显示SAFE优化了基于纳什社会福利公平目标的上限分配,我们将其差距与利用集中界限对未来总需求的最佳性联系起来。我们使用合成和真实的数据,将SAFE的绩效与现有的方法和在MDP上受过培训的强化学习政策进行比较。我们表明,SAFFE导致更公平和高效的分配,并实现与高密度目的地环境上的接近性业绩。