We propose a novel formulation of group fairness with biased feedback in the contextual multi-armed bandit (CMAB) setting. In the CMAB setting, a sequential decision maker must, at each time step, choose an arm to pull from a finite set of arms after observing some context for each of the potential arm pulls. In our model, arms are partitioned into two or more sensitive groups based on some protected feature(s) (e.g., age, race, or socio-economic status). Initial rewards received from pulling an arm may be distorted due to some unknown societal or measurement bias. We assume that in reality these groups are equal despite the biased feedback received by the agent. To alleviate this, we learn a societal bias term which can be used to both find the source of bias and to potentially fix the problem outside of the algorithm. We provide a novel algorithm that can accommodate this notion of fairness for an arbitrary number of groups, and provide a theoretical bound on the regret for our algorithm. We validate our algorithm using synthetic data and two real-world datasets for intervention settings wherein we want to allocate resources fairly across groups.
翻译:我们提出一种新颖的集团公平配方,在多武装土匪背景(CMAB)环境中有偏见的反馈。在CMAB环境中,一个顺序决策人必须在每一步都选择一个手臂,在观察每个潜在手臂拉动的某些背景之后,从有限的一组手臂中拉出来。在我们的模型中,武器被分割成基于某些受保护特征(例如年龄、种族或社会经济地位)的两个或更多的敏感群体。拉动手臂获得的初步收益可能由于一些未知的社会或测量偏差而被扭曲。我们假定,尽管代理人收到了偏差的反馈,但这些团体在现实中是平等的。为了缓解这一点,我们学习了一种社会偏见术语,既可以用来找到偏见的来源,也可以用来在算法之外解决问题。我们提供了一种新的算法,可以顾及对任意群体数目的公平概念,并为我们的算法提供理论约束。我们用合成数据和两个真实世界的数据集来验证我们的算法,用于干预环境,我们希望在其中将资源公平地分配给各个群体。