We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.
翻译:我们提议扩大私人信息检索的概念,允许在检索过程中出现扭曲,同时放宽完美的隐私要求;特别是,我们研究下载率、扭曲率和用户隐私泄漏之间的权衡,并表明在大文件大小的限度内,可以通过对已知分布的数据集的新的信息-理论配方来捕捉这种权衡;此外,对于数据集统计数据未知的情况,我们提议一个新的深层次学习框架,利用基因对抗网络方法,使用户能够从数据本身学习有效的计划;我们评估合成高山数据集以及MNIST、CIFAR-10和LSUN数据集的性能。对于MNIST、CIFAR-10和LSUN数据集来说,数据驱动方法大大超越了将源代码与多个文件下载相结合的非学习方法。