A number of learning models used in consequential domains, such as to assist in legal, banking, hiring, and healthcare decisions, make use of potentially sensitive users' information to carry out inference. Further, the complete set of features is typically required to perform inference. This not only poses severe privacy risks for the individuals using the learning systems, but also requires companies and organizations massive human efforts to verify the correctness of the released information. This paper asks whether it is necessary to require \emph{all} input features for a model to return accurate predictions at test time and shows that, under a personalized setting, each individual may need to release only a small subset of these features without impacting the final decisions. The paper also provides an efficient sequential algorithm that chooses which attributes should be provided by each individual. Evaluation over several learning tasks shows that individuals may be able to report as little as 10\% of their information to ensure the same level of accuracy of a model that uses the complete users' information.
翻译:在相应领域使用的一些学习模式,例如协助法律、银行、雇用和保健决策,利用潜在敏感的用户信息进行推理。此外,通常需要整套特征才能进行推理,这不仅给使用学习系统的个人造成严重的隐私风险,而且还需要公司和组织作出巨大的人力努力,核实所发布信息的正确性。本文件询问是否需要要求模型输入功能,以便在测试时返回准确的预测,并表明,在个性化的环境下,每个人可能需要在不影响最后决定的情况下,仅发布这些特征中的一小部分。本文还提供高效的顺序算法,选择每个个人应提供哪些属性。对几项学习任务的评价表明,个人可能能够报告其信息只有10个,以确保使用完整用户信息的模型的准确性。