学习中的可复制性和稳定性 (Replicability and stability in learning)

Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran (`20) in the context of differential privacy. Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results in addition imply that, besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized. The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'e-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.

翻译：可复制性对于科学来说是至关重要的，它使我们能够验证研究结果。Impagliazzo、Lei、Pitassi和Sorrell（22年）最近发起了有关机器学习中可复制性的研究。如果一个学习算法在使用相同内部随机性的两个i.i.d.输入时通常产生相同的输出，则该算法是可复制的。我们研究了一种不涉及固定随机性的可复制性变体。如果一个算法在使用两个i.i.d.输入（未固定内部随机性）时通常会产生相同的输出，则该算法满足这种形式的可复制性。这个变体称为全局稳定性，在差分隐私的背景下由Bun、Livni和Moran（20年）引入。Impagliazzo等人展示了如何提升任何可复制的算法，使其产生的输出概率无限接近于1。相反地，我们证明了对于许多学习任务，全局稳定性只能实现弱稳定性，其中同样的输出仅会以概率较小的概率被产生。为了克服这种限制，我们引入了列表可复制性的概念，它与全局稳定性等价。此外，我们证明了列表可复制性可以提升，以使其以无限接近于1的概率被实现。我们还描述了标准学习理论复杂度度量与列表可复制数量之间的基本关系。我们的结果还意味着，除了微不足道的情况外，可复制的算法（如Impagliazzo等人的定义）必须是随机的。这个不可能性结果的证明基于一个拓扑上的不动点定理。对于每个算法，我们能够通过在相关的拓扑设置中应用Poincaré-Miranda定理来定位一个“困难的输入分布”。全局稳定性和列表可复制性之间的等价性是算法的。