AI methods are used in societally important settings, ranging from credit to employment to housing, and it is crucial to provide fairness in regard to algorithmic decision making. Moreover, many settings are dynamic, with populations responding to sequential decision policies. We introduce the study of reinforcement learning (RL) with stepwise fairness constraints, requiring group fairness at each time step. Our focus is on tabular episodic RL, and we provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation. Our framework provides useful tools to study the impact of fairness constraints in sequential settings and brings up new challenges in RL.
翻译:在从信贷到就业到住房等具有社会重要性的环境中,使用自学方法,在算法决策方面提供公平性至关重要。此外,许多环境是动态的,人口对顺序决策政策作出反应。我们引入了强化学习(RL)研究,采用渐进式公平性限制,要求每个步骤都实行群体公平性。我们的重点是表格缩写自编RL,我们为学习算法提供有力的理论保障,说明政策的最佳性和公平性。我们的框架为研究相继环境中的公平性限制的影响和提出新问题提供了有用的工具。