Long-term fairness is an important factor of consideration in designing and deploying learning-based decision systems in high-stake decision-making contexts. Recent work has proposed the use of Markov Decision Processes (MDPs) to formulate decision-making with long-term fairness requirements in dynamically changing environments, and demonstrated major challenges in directly deploying heuristic and rule-based policies that worked well in static environments. We show that policy optimization methods from deep reinforcement learning can be used to find strictly better decision policies that can often achieve both higher overall utility and less violation of the fairness requirements, compared to previously-known strategies. In particular, we propose new methods for imposing fairness requirements in policy optimization by regularizing the advantage evaluation of different actions. Our proposed methods make it easy to impose fairness constraints without reward engineering or sacrificing training efficiency. We perform detailed analyses in three established case studies, including attention allocation in incident monitoring, bank loan approval, and vaccine distribution in population networks.
翻译:长期公平是设计和部署高决策环境中以学习为基础的决策系统时考虑的一个重要因素。最近的工作提议利用Markov决策程序(MDPs)在不断变化的环境中制定具有长期公平要求的决策,并表明在直接采用在静态环境中行之有效的基于规则的政策方面存在重大挑战。我们表明,从深层强化学习中得出的政策优化方法可以用来找到严格更好的决策政策,这种政策往往比以前已知的战略在总体效益方面更高,而且较少违反公平要求。我们特别提出了在政策优化中实行公平要求的新方法,对不同的行动进行定期的优势评价。我们提议的方法使得在不奖励工程或牺牲培训效率的情况下,很容易施加公平限制。我们对三个既定案例研究进行了详细分析,包括在事故监测、银行贷款批准和人口网络疫苗分配方面进行关注分配。