In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method with experiments on the CIFAR-10, ImageNet, and CoronaHack datasets.
翻译:在涉及相应决策的现实环境中,安装机器学习系统通常既需要可靠的不确定性量化,也需要保护个人隐私。我们提出了一个共同处理这两个问题的框架。我们的框架以一致预测为基础,即增强预测模型以提供不确定性量化的返回预测数据集的预测模型的方法 -- -- 它们以用户特定概率(如90%)覆盖真实反应,例如90%。人们可能希望,在使用私人培训模型时,一致预测将为由此产生的预测数据集提供隐私保障;不幸的是,情况并非如此。为了纠正这一关键问题,我们开发了一种方法,采用任何预先训练的预测模型和有差别的私人预测数据集。我们的方法遵循了分解一致预测的一般方法;我们使用屏蔽数据校准了预测数据集的大小,但通过使用私有化的Quantile 亚常规来保护隐私。这一子常规可以补偿为维护隐私而引入的噪音,以保证准确覆盖。我们用在CIFAR-10、图像网和Corona-Hack数据集进行实验的方法评估。