Differentially-private mechanisms for text generation typically add carefully calibrated noise to input words and use the nearest neighbor to the noised input as the output word. When the noise is small in magnitude, these mechanisms are susceptible to reconstruction of the original sensitive text. This is because the nearest neighbor to the noised input is likely to be the original input. To mitigate this empirical privacy risk, we propose a novel class of differentially private mechanisms that parameterizes the nearest neighbor selection criterion in traditional mechanisms. Motivated by Vickrey auction, where only the second highest price is revealed and the highest price is kept private, we balance the choice between the first and the second nearest neighbors in the proposed class of mechanisms using a tuning parameter. This parameter is selected by empirically solving a constrained optimization problem for maximizing utility, while maintaining the desired privacy guarantees. We argue that this empirical measurement framework can be used to align different mechanisms along a common benchmark for their privacy-utility tradeoff, particularly when different distance metrics are used to calibrate the amount of noise added. Our experiments on real text classification datasets show up to 50% improvement in utility compared to the existing state-of-the-art with the same empirical privacy guarantee.
翻译:用于文本生成的不同私人机制通常会为输入单词添加经过仔细校准的噪音,并将最近的邻居作为输出单词使用。 当噪声数量小时, 这些机制很容易重塑原始敏感文本。 这是因为最近的独家机制很可能是原始输入的原始输入。 为了减轻这种经验隐私风险, 我们提议了一种新的不同私人机制类别, 将最近邻居的选择标准在传统机制中参数化。 受Vickrey 拍卖的驱动, 只有第二高的价格被披露, 最高的价格被保密, 我们平衡了拟议机制类别中第一个和第二近邻的选择, 使用调频参数。 这个参数是通过实验性地解决一个有限的优化问题, 以最大限度地发挥效用, 同时维护理想的隐私保障。 我们主张, 这个经验性衡量框架可以用来将不同机制与隐私使用的共同基准相匹配, 特别是当使用不同的距离测量来校准噪声量时, 我们关于实际文本分类数据集的实验显示, 与现有的国家隐私保障相比, 效用改善到50% 。