We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of noisy labels. We design an algorithm incorporating consistent losses and distance-based regularization for fine-tuning, along with a generalization error guarantee under class conditional independent noise in the training set labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. On six image classification tasks whose training labels are generated with programmatic labeling, we find a 3.26% accuracy gain over prior fine-tuning methods. Meanwhile, the Hessian distance measure of the fine-tuned model decreases by six times more than existing approaches.
翻译:我们考虑在目标任务上微调精练深神经网络的传学方法。我们研究了微调的一般特性,以了解通常在实际中经常发生的超装问题。以前的工作表明,限制微调初始化的距离可以改进一般化。我们使用PAC-Bayesian分析发现,除了从初始化的距离外,赫森人通过深神经网络的噪音稳定性来影响一般化,防止噪音注入。我们受观察的启发,我们开发了赫森远程一般化的广域,以了解各种微调方法。此外,我们研究微调在噪音标签存在时的稳健性。我们设计了一种算法,将一贯的损失和远程规范纳入微调,同时在培训设置标签中对班级有条件的独立噪音进行一般化错误保证。我们对各种噪音环境和结构进行详细的算法实验性研究。在6个图像分类任务中,通过程序标签生成了培训标签,我们发现比以前的微调方法增加了3.26%的精度。同时,他还进一步测量了现有的距离方法。