The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers (e.g., SGD). In this paper, we propose a novel paradigm of incorporating model-specific prior knowledge into optimizers and using them to train generic (simple) models. As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper-parameters, which is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. The code and models are publicly available at https://github.com/DingXiaoH/RepOptimizers.
翻译:神经网络中设计良好的结构反映了先前融入模型的知识。然而,尽管不同的模型有不同的前科,但我们也使用不同的前科,用模型的优化优化器(如SGD)来培训这些模型。在本文中,我们提出了一个新颖的范例,将特定模型的先前知识纳入优化器,并使用这些模型来培训通用(简单)模型。作为一项实施,我们建议了一种新颖的方法,通过根据一套具体模型的超参数修改梯度来增加先前知识,该模型被称为 " 梯度再校准 ",而优化器则称为 " Repoptimerizer " 。对于模型结构的极端简单性,我们侧重于VGGG-风格的简单模型,并展示了这样一种简单的模式,即将特定模型的先前知识纳入优化器(称为Repoptimer-VGG),该模型与最近设计完善的模型相同。从实际角度看,Repopt-VGGG是一个有利的基础模型,因为其结构简单、高推导速度和培训效率。对比结构再校准/再校准机的模型,在建立前期的校准/后的校正/后校正/校正的校正的模型中,需要先制的校正的校正/制的校正的校正的校正/校正的校正的校正的校正的校正。