We examine the problem of obtaining fair outcomes for individuals who choose to share optional information with machine-learned models and those who do not consent and keep their data undisclosed. We find that these non-consenting users receive significantly lower prediction outcomes than justified by their provided information alone. This observation gives rise to the overlooked problem of how to ensure that users, who protect their personal data, are not penalized. While statistical fairness notions focus on fair outcomes between advantaged and disadvantaged groups, these fairness notions fail to protect the non-consenting users. To address this problem, we formalize protection requirements for models which (i) allow users to benefit from sharing optional information and (ii) do not penalize them if they keep their data undisclosed. We offer the first solution to this problem by proposing the notion of Optional Feature Fairness (OFF), which we prove to be loss-optimal under our protection requirements (i) and (ii). To learn OFF-compliant models, we devise a model-agnostic data augmentation strategy with finite sample convergence guarantees. Finally, we extensively analyze OFF on a variety of challenging real-world tasks, models, and data sets with multiple optional features.
翻译:我们研究如何使选择与机器学习模型分享任择信息的个人和不同意分享其数据的个人获得公平结果的问题,我们发现,这些不同意的用户得到的预测结果大大低于仅以他们提供的信息为据的预测结果,这种观察造成了一个被忽视的问题,即如何确保保护其个人数据的用户不受处罚;虽然统计公平概念侧重于优劣群体之间的公平结果,但这些公平概念未能保护不同意的用户;为解决这一问题,我们正式确定了对以下模式的保护要求:(一) 使用户从分享任择信息中受益,以及(二) 如果不披露其数据,则不惩罚他们。我们提出“任择法公正”概念(OFF),这是第一个解决问题的办法,我们证明,根据我们的保护要求(一)和(二),这种概念是损失最佳的。为了了解符合FTO的模型,我们用有限的样本融合保证,设计了一个模式-不可否认的数据增强战略。最后,我们广泛分析了F对具有挑战性的现实任务、模式和具有多种任择特征的数据集的各种挑战性任务、模型和数据集。