Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers the using the features of $\textit{neural tangent kernels (NTKs)}$, more precisely $\textit{empirical}$ NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.
翻译:最大平均值差异(MMD)是用于不同私人数据生成的一个特别有用的距离衡量标准:在使用有限维特性时,它使我们可以总结数据发布并实现一次私有化,在发电机培训期间,我们可以在不进一步隐私损失的情况下反复使用这些数据发布。因此,这个框架中的一个重要问题是,哪些特征有助于区分真实数据和合成数据发布,以及这些特征是否使我们能够生成高质量的合成数据。 这项工作考虑了使用美元(textit{neal denning centlels)的特征,更精确地说是美元(textit{empriscal}$NTKs(e-NTKs)的特征。我们发现,也许令人惊讶的是,未经培训的电子NTK特征的清晰度与使用公共数据从事先经过训练的认知特征中得出的特征相似。结果就是,我们的方法改善了隐私-准确性交易与其他状态方法相比,不依赖任何公共数据,正如几个表格和图像基准数据集所显示的那样。</s>