Machine learning models are often personalized with categorical attributes that are protected, sensitive, self-reported, or costly to acquire. In this work, we show models that are personalized with group attributes can reduce performance at a group level. We propose formal conditions to ensure the "fair use" of group attributes in prediction tasks by training one additional model -- i.e., collective preference guarantees to ensure that each group who provides personal data will receive a tailored gain in performance in return. We present sufficient conditions to ensure fair use in empirical risk minimization and characterize failure modes that lead to fair use violations due to standard practices in model development and deployment. We present a comprehensive empirical study of fair use in clinical prediction tasks. Our results demonstrate the prevalence of fair use violations in practice and illustrate simple interventions to mitigate their harm.
翻译:机器学习模式往往具有个性,具有受保护、敏感、自我报告或成本高昂的绝对属性。在这项工作中,我们展示了具有群体属性的个性化模型,可以降低群体层面的绩效。我们提出正式条件,通过培训另外一种模式,确保“公平使用”群体属性,以确保在预测任务中“公平使用”群体属性,即集体偏好保障,确保提供个人数据的每个群体在工作表现中获得量身定制的收益。我们提供了充分的条件,以确保在经验风险中公平使用最小化,并描述导致因模式开发和部署的标准做法而导致的不成功模式。我们提出了临床预测任务公平使用的全面经验研究。我们的结果表明在实践中公平使用侵权行为的普遍程度,并说明了减轻其伤害的简单干预措施。