We initiate an investigation of private sampling from distributions. Given a dataset with $n$ independent observations from an unknown distribution $P$, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to $P$ while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on $\{1,\ldots ,k\}$, arbitrary product distributions on $\{0,1\}^d$, and product distributions on $\{0,1\}^d$ with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of $P$ nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
翻译:我们从分布区开始对私人抽样进行调查。鉴于一组数据集有美元的独立独立观察,来自不明的分布区美元,抽样算法必须从分布区得出单一的观察结果,该分布区在完全变差距离接近于美元的同时,满足了不同的隐私。抽样总结了产生少量现实数据的目标。我们为分配区三个自然家庭这项任务所需的数据集尺寸提供了严格的上限和下限:任意分配1美元、1美元、k ⁇ 美元、以0.1美元和0.1美元任意产品分配,以及以0.1美元和每个分布区之间有偏差的产品分布。 我们表明,在某些参数系统中,私人采样需要的观察次数比非私人采样所需的观察次数少,但在其他制度下,私人采样证明很难象私人学习一样。 值得注意的是,对于某些分布区,私人学习所需的观察次数相对于非私人学习的间接费用完全被私人采样所需的观察次数所捕捉到。