Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized deep generative model (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. Formally, Reg-DGM optimizes a weighted sum of a certain divergence and the expectation of an energy function, where the divergence is between the data and the model distributions, and the energy function is defined by the pre-trained model w.r.t. the model distribution. We analyze a simple yet representative Gaussian-fitting case to demonstrate how the weighting hyperparameter trades off the bias and the variance. Theoretically, we characterize the existence and the uniqueness of the global minimum of Reg-DGM in a non-parametric setting and prove its convergence with neural networks trained by gradient-based methods. Empirically, with various pre-trained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data and achieves competitive results to the state-of-the-art methods. Our implementation is available at https://github.com/ML-GSAI/Reg-ADA-APA.
翻译:摘要:因为在有限数据上学习复杂模型会受到巨大方差且容易过拟合,所以深度生成模型(DGM)对于数据要求较高。我们灵感来自于传统的偏差方差权衡,提出了一种正则深度生成模型(Reg-DGM),该模型利用非可转移的预训练模型减少有限数据的生成建模方差。形式上,Reg-DGM优化特定的差异度量和能量函数期望值的加权和,其中差异度量是数据与模型分布之间的差异,能量函数是根据模型分布关于预训练模型定义的。我们分析了简单但具有代表性的高斯拟合情况,以演示加权超参数是如何权衡偏差和方差的。理论上,我们在非参数设定下表征了Reg-DGM全局最小值的存在性和唯一性,并证明了其随使用梯度法训练的神经网络收敛。在实践中,利用各种预训练特征提取器和数据相关的能量函数,Reg-DGM始终改善具有有限数据的强DGM的生成性能并取得了与最先进方法相竞争的结果。
我们的实现可在 https://github.com/ML-GSAI/Reg-ADA-APA 上获得。