We study generalization bounds for noisy stochastic mini-batch iterative algorithms based on the notion of stability. Recent years have seen key advances in data-dependent generalization bounds for noisy iterative learning algorithms such as stochastic gradient Langevin dynamics (SGLD) based on stability (Mou et al., 2018; Li et al., 2020) and information theoretic approaches (Xu and Raginsky, 2017; Negrea et al., 2019; Steinke and Zakynthinou, 2020; Haghifam et al., 2020). In this paper, we unify and substantially generalize stability based generalization bounds and make three technical advances. First, we bound the generalization error of general noisy stochastic iterative algorithms (not necessarily gradient descent) in terms of expected (not uniform) stability. The expected stability can in turn be bounded by a Le Cam Style Divergence. Such bounds have a O(1/n) sample dependence unlike many existing bounds with O(1/\sqrt{n}) dependence. Second, we introduce Exponential Family Langevin Dynamics(EFLD) which is a substantial generalization of SGLD and which allows exponential family noise to be used with stochastic gradient descent (SGD). We establish data-dependent expected stability based generalization bounds for general EFLD algorithms. Third, we consider an important special case of EFLD: noisy sign-SGD, which extends sign-SGD using Bernoulli noise over {-1,+1}. Generalization bounds for noisy sign-SGD are implied by that of EFLD and we also establish optimization guarantees for the algorithm. Further, we present empirical results on benchmark datasets to illustrate that our bounds are non-vacuous and quantitatively much sharper than existing bounds.
翻译:我们根据稳定概念研究杂乱的小型和散装迭代算法的概括界限。近些年来,在基于稳定概念的基于数据的基于数据的一般性概括界限方面,出现了一些关键的进展。在本文中,我们统一并大大概括了基于稳定(Mou等人,2018年;Li等人,2020年)和信息理论方法(Xu和Raginsky,2017年;Negrea等人,2019年;Steinke和Zakynthinou,2020年;Haghifam等人,2020年)的基于数据的超常迭代学习算法(SGLD),例如基于稳定(Mou等人,20188年;Li等人,2020年)和信息理论方法(Xu和Raginskysky;Negrea等人,2019年;Steinkeke和Zak-lentis,2020年);在基于O(1/Squral-ral-rality)的当前结果中,我们统一并采用Scial-al-al-LD(Sqral-al-alalalalal deal deliversal),我们采用Scial-de的Scial deliversal deal deal deal deal deal deal deal dealisl),我们使用了通用的标志的标志,我们使用Siral 和Smarxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,我们通用通用通用通用通用通用通用一般一般一般一般一般一般通用通用通用通用一般一般一般