This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann distributions, respectively, each of which characterizes the corresponding convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then established via PAC-Bayesian framework. The theory finds that the generalization error of the fine-tuning stage is dominant in domain adaptation. In addition, our theory suggests that the generalization is determined by a new measure that characterizes the domain discrepancy between the source domain and target domain, based on the covariance matrices and the shift of the converged local minimum.
翻译:本文试图通过域适应为新兴超级模范模式建立理论基础, 通过域适应, 首先是在大量数据的基础上, 在大量数据的基础上, 建立新兴超级模范模式的理论基础, 即 prit i. y. }, 超级模范( 或一些其他论文中的基建模型), 在大量数据的基础上, 将模型参数迁移到另一个稳定的分布。 超级模范有助于降低计算和数据成本以及碳排放, 这对AI行业, 特别是巨大的中小型企业至关重要。 我们将超级模范模式建为两个阶段的推广过程:(1) 在培训前阶段, 模型参数从随机的首字母扩散到稳定的分布; (2) 在微调阶段, 模型参数被传送到另一个稳定的分布。 两个培训阶段都可以用Uhlenbeck- Ornnstein 进程数学模型建模, 分别与 Maxwell- Boltzmann- Boltzmann 的两种分布相交汇模式相趋同。 我们用一个$mathcalcalcalizalizal- galization tradeal gration the greal grational gradistration gradistration the greal gradude the greal grational grational gradudeal gradude the grealizalizaliztal rogal rogal roal roal subal subis subal subal subal subal subal subal sult。理论认为, 。理论认为, 。理论在我们域模型的模型的模型决定, 和以一般域缩缩基调制的模型的模型的模型 。 。 。根据地基调制的模型, 校正地基调制的校正地基。理论认为, 校正的校正的模型, 校平域基。