Predictive parity (PP), also known as sufficiency, is a core definition of algorithmic fairness essentially stating that model outputs must have the same interpretation of expected outcomes regardless of group. Testing and satisfying PP is especially important in many settings where model scores are interpreted by humans or directly provide access to opportunity, such as healthcare or banking. Solutions for PP violations have primarily been studied through the lens of model calibration. However, we find that existing calibration-based tests and mitigation methods are designed for independent data, which is often not assumable in large-scale applications such as social media or medical testing. In this work, we address this issue by developing a statistically rigorous non-parametric regression based test for PP with dependent observations. We then apply our test to illustrate that PP testing can significantly vary under the two assumptions. Lastly, we provide a mitigation solution to provide a minimally-biased post-processing transformation function to achieve PP.
翻译:暂无翻译