The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $\delta$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.
翻译:在Impagliazzo等人[STOC'22]的论文中,引入了可复现性算法的概念,用于描述对其输入重采样时稳定的随机化算法。更准确地说,可复现的算法在其随机性固定且在从相同分布中绘制的新的i.i.d.样本上运行时,具有高概率地产生相同的输出。使用可复现的算法进行数据分析可以通过确保在新数据集上执行分析时结果具有高概率相同,从而方便验证已发布结果。在这项工作中,我们建立了可复制性和算法稳定性标准概念之间的新的联系和分离。特别是,我们为广泛的统计问题给出了完美泛化、近似差分隐私和可复现性之间的样本高效算法归约。相反,我们还展示了这些等价关系在计算上必须崩溃:存在一些易于处理的对于隐私保护的统计问题,但如果不破坏公钥密码学,则无法可复制地解决。此外,这些结果是紧致的:我们的归约是统计上最优的,并且我们展示了DP和可复制性之间的任何计算分离必须意味着单向函数存在。我们的统计归约提供了一种新的算法框架来在稳定性概念之间进行转换,我们将其实例化以回答可复制性和隐私保护中的若干开放问题。这包括为各种PAC学习、分布估计和分布测试问题提供样本高效可复制性算法,近似DP中$\delta$算法放大,从物品级到用户级隐私的转换以及在结构化分布下的私有异质的可学习化规约的存在。