Fairness-aware machine learning seeks to maximise utility in generating predictions while avoiding unfair discrimination based on sensitive attributes such as race, sex, religion, etc. An important line of work in this field is enforcing fairness during the training step of a classifier. A simple yet effective binary classification algorithm that follows this strategy is two-naive-Bayes (2NB), which enforces statistical parity - requiring that the groups comprising the dataset receive positive labels with the same likelihood. In this paper, we generalise this algorithm into N-naive-Bayes (NNB) to eliminate the simplification of assuming only two sensitive groups in the data and instead apply it to an arbitrary number of groups. We propose an extension of the original algorithm's statistical parity constraint and the post-processing routine that enforces statistical independence of the label and the single sensitive attribute. Then, we investigate its application on data with multiple sensitive features and propose a new constraint and post-processing routine to enforce differential fairness, an extension of established group-fairness constraints focused on intersectionalities. We empirically demonstrate the effectiveness of the NNB algorithm on US Census datasets and compare its accuracy and debiasing performance, as measured by disparate impact and DF-$\epsilon$ score, with similar group-fairness algorithms. Finally, we lay out important considerations users should be aware of before incorporating this algorithm into their application, and direct them to further reading on the pros, cons, and ethical implications of using statistical parity as a fairness criterion.
翻译:公平认识的机器学习旨在尽量扩大预测的效用,同时避免基于种族、性别、宗教等敏感属性的不公平歧视。 这个领域的一个重要工作方针是在分类员的培训步骤中实行公平。 遵循这一战略的一个简单而有效的二进分类算法是双向-Bayes (2NB),它强制实行统计均等,要求组成数据集的小组以同样的可能性获得正面标签;在本文中,我们将这一算法概括为N-naive-Bayes(NNB),以消除在数据中只假定两个敏感组的简化,而将其应用于任意数目的群体。我们提议扩大原算法的统计均等限制和后处理常规,以实施标签和单一敏感属性的统计独立性。 然后,我们调查其对具有多重敏感特征的数据的应用,并提出新的制约和后处理常规,以实施差异公平、扩大既有的集团公平性限制,侧重于交叉性。 我们从经验上表明NBEB对美国普查数据的计算法的有效性,并将其准确性和准确性影响与准确性对比其准确性,最终衡量其准确性标准,通过平级的计算,将其准确性与准确性比等标准,在进行衡量。