Fairness has emerged as an important concern in automated decision-making in recent years, especially when these decisions affect human welfare. In this work, we study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs). Our proposed notion of fairness ensures that each state's long-term visitation frequency is at least a specified fraction. This quota-based notion of fairness is natural in many resource-allocation settings where the dynamics of a single resource being allocated is governed by an MDP and the distribution of the shared resource is captured by its state-visitation frequency. In an average-reward MDP (AMDP) setting, we formulate the problem as a bilinear saddle point program and, for a generative model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The proposed solution guarantees a simultaneous approximation on the expected average-reward and fairness requirement. We give sample complexity bounds for the proposed algorithm and validate our theoretical results with experiments on simulated data.
翻译:近年来,在自动决策中,特别是在这些决定影响人类福祉的情况下,公平已成为一个重要关切问题。在这项工作中,我们研究了时间推延决策环境的公平性,特别是作为Markov决定程序(MDPs)制定的决策环境。我们提议的公平性概念确保了每个国家的长期访问频率至少是一个特定部分。这种基于配额的公平性概念在许多资源分配环境中是自然的,在这种环境中,分配的单一资源的动态由一个 MDP 管理,共享资源的分布以其国家访问频率得到。在平均回报的 MDP (AMDP) 设置中,我们把问题设计成双线支撑点程序,对于一种基因模型来说,则使用基于Stochatical Mirrum(SMD)的算法解决问题。拟议解决办法保证对预期的平均回报和公平性要求同时进行近似近。我们为拟议的算法提供了样本的复杂性界限,并以模拟数据的实验来验证我们的理论结果。