学习中的可复制性和稳定性 (Replicability and stability in learning)

Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran ('20) in the context of differential privacy. Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results, in addition, imply that besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized. The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'{e}-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.

翻译：可复制性对于科学来说是必不可少的，它能够验证和确认研究结果。Impagliazzo、Lei、Pitassi和Sorrell（'22）最近引入了机器学习中的可复制性研究。如果学习算法在使用相同内部随机性时在两个独立同分布输入上通常产生相同的输出，那么学习算法是可复制的。我们研究了一种不涉及固定随机性的可复制性的变体。如果算法在两个独立同分布输入上通常产生相同的输出（而不是固定内部随机性），则算法满足这种形式的可复制性。这种变体是差分隐私背景下Bun、Livni和Moran（'20）引入的全局稳定性。Impagliazzo等人展示了如何增强任何可复制的算法，使其产生的输出与1的概率任意接近。相反，我们证明了对于许多学习任务，全局稳定性只能弱化实现，其中只有以1为界限的概率产生相同的结果。为了克服这种限制，我们引入了列表可复制性的概念，其等价于全局稳定性。此外，我们证明了列表可复制性可以增强，以使其与1的概率任意接近。我们还描述了标准学习理论复杂度度量与列表可复制数量之间的基本关系。我们的结果还意味着，除了显然的情况外，可复制性算法（在Impagliazzo等人的意义下）必须是随机的。不可能性的证明基于拓扑不动点定理。对于每个算法，我们能够在相关的拓扑设置中应用Poincar\'{e}-Miranda定理来定位一个“难以输入分布”。全局稳定性和列表可复制性之间的等价关系是算法的。