In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method on large-scale computer vision datasets.
翻译:在涉及相应决策的现实环境中,安装机器学习系统通常既需要可靠的不确定性量化,也需要保护个人隐私。我们提出了一个共同处理这两种偏差的框架。我们的框架基于一致的预测,一种扩大预测模型的预测模型,以返回预测数据集,提供不确定性的量化 -- -- 可以用90%的用户特定概率(如90%)来覆盖真实反应。人们可能希望,在使用私人培训模型时,符合的预测能够为由此产生的预测数据集提供隐私保障;不幸的是,情况并非如此。为了纠正这一关键问题,我们开发了一种方法,采用任何预先训练的预测模型和有差别的私人预测数据集。我们的方法遵循了分解一致预测的一般方法;我们使用屏蔽数据校准预测数据集的规模,但使用私有化的四分路路段来保护隐私。这一子路段可以补偿为维护隐私而引入的噪音,以保证准确覆盖。我们评估大规模计算机视觉数据集的方法。