To grant users greater authority over their personal data, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle within these regulations is data minimization, which urges companies and institutions to only collect data that is relevant and adequate for the purpose of the data analysis. In this work, we take a user-centric perspective on this regulation, and let individual users decide which data they deem adequate and relevant to be processed by a machine-learned model. We require that users who decide to provide optional information should appropriately benefit from sharing their data, while users who rely on the mandate to leave their data undisclosed should not be penalized for doing so. This gives rise to the overlooked problem of fair treatment between individuals providing additional information and those choosing not to. While the classical fairness literature focuses on fair treatment between advantaged and disadvantaged groups, an initial look at this problem through the lens of classical fairness notions reveals that they are incompatible with these desiderata. We offer a solution to this problem by proposing the notion of Optional Feature Fairness (OFF) that follows from our requirements. To operationalize OFF, we derive a multi-model strategy and a tractable logistic regression model. We analyze the effect and the cost of applying OFF on several real-world data sets.
翻译:为了赋予用户对其个人数据更大的权力,决策者建议了更严格的数据保护条例(例如,GDPR、CCPA)。这些条例中的一个关键原则是尽量减少数据,这促使公司和机构只收集与数据分析目的相关和足够的数据。在这项工作中,我们从以用户为中心的角度来看待这一条例,让个别用户决定他们认为哪些数据适当和相关的数据应由机械学模型处理。我们要求决定提供任择信息的用户应当从分享其数据中适当受益,而依赖不公布其数据的用户不应因此而受到惩罚。这引起了被忽视的公平对待提供补充信息的个人和选择不提供补充信息的人的问题。传统公平文献侧重于有利和处境不利的群体之间的公平待遇,而从传统公平概念的视角初步审视这一问题表明,它们与这些机器学模式不相容。我们提出根据我们的要求而采用的任择特性公平概念,为这一问题提供了解决办法。我们在实施F时,我们从多种模式的战略和若干可扩展的物流模型上应用了成本和精确的回归模型。