Tuning all the hyperparameters of differentially private (DP) machine learning (ML) algorithms often requires use of sensitive data and this may leak private information via hyperparameter values. Recently, Papernot and Steinke (2022) proposed a certain class of DP hyperparameter tuning algorithms, where the number of random search samples is randomized itself. Commonly, these algorithms still considerably increase the DP privacy parameter $\varepsilon$ over non-tuned DP ML model training and can be computationally heavy as evaluating each hyperparameter candidate requires a new training run. We focus on lowering both the DP bounds and the computational complexity of these methods by using only a random subset of the sensitive data for the hyperparameter tuning and by extrapolating the optimal values from the small dataset to a larger dataset. We provide a R\'enyi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off than the baseline method by Papernot and Steinke (2022).
翻译:不同私人(DP)机器学习(ML)算法的所有超参数都要求使用敏感数据,这可能会通过超参数值泄露私人信息。 最近,Papernot和Steinke(2022年)提出了某种类型的DP超参数调算法,随机抽查样本的数量是随机的。一般而言,这些算法仍然大大提高了DP隐私参数$\varepsilon$,高于未调整的DP ML模式培训,而且可以计算得重,因为每个超参数候选人的评价都需要一个新的培训运行。我们侧重于降低这些方法的DP边框和计算复杂性,只使用超参数调的敏感数据的随机子集,将小数据集的最佳值外推至更大的数据集。我们为拟议方法提供了R\'enye差异隐私分析,并实验性地表明,它始终导致比Papernot和Steinke(2022年)的基线方法更好的隐私-利用率交易。