We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We investigate generalization properties of fine-tuning to understand the problem of overfitting, which often happens in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Next, we investigate the robustness of fine-tuning with noisy labels. We design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning. Additionally, we prove a generalization error bound of our algorithm under class conditional independent noise in the training dataset labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. For example, on six image classification tasks whose training labels are generated with programmatic labeling, we show a 3.26% accuracy improvement over prior methods. Meanwhile, the Hessian distance measure of the fine-tuned network using our algorithm decreases by six times more than existing approaches.
翻译:我们考虑在目标任务上微调精练深神经网络的传学方法。我们调查微调的一般特性,以了解超装问题,这在实际中经常发生。以前的工作表明,限制微调初始化的距离可以改进一般化。我们使用PAC-Bayesian分析发现,除了从初始化的距离外,赫森人通过深神经网络的噪音稳定性影响一般化,防止噪音注入。我们受观察的驱动,我们为广泛的微调方法开发了赫森远程通用框架。接下来,我们调查了微调与噪音标签的强健性。我们设计了一种算法,其中包括了持续损失和远程规范的微调。此外,我们证明,在培训数据集标签中,在班级有条件的独立噪音下,我们的算法存在一种普遍化错误。我们对各种噪音的环境和结构进行了详细的实证研究。例如,六种图像分类任务的培训标签是用方案标签生成的,我们展示了3.26 %的精确度改进率比以前的网络方法要低六倍。同时,他还用比现有的算法改进了我们以前的距离的方法。