In this paper, we study how pretraining label granularity affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting where the pretraining label is more fine-grained than that of the target problem. We experiment with this method using the label hierarchy of iNaturalist 2021, and observe a 8.76% relative improvement of the error rate over the baseline. We find the following conditions are key for the improvement: 1) the pretraining dataset has a strong and meaningful label hierarchy, 2) its label function strongly aligns with that of the target task, and most importantly, 3) an appropriate level of pretraining label granularity is chosen. The importance of pretraining label granularity is further corroborated by our transfer learning experiments on ImageNet. Most notably, we show that pretraining at the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining at other coarser granularity levels, which supports the common practice. Theoretically, through an analysis on a two-layer convolutional ReLU network, we prove that: 1) models trained on coarse-grained labels only respond strongly to the common or "easy-to-learn" features; 2) with the dataset satisfying the right conditions, fine-grained pretraining encourages the model to also learn rarer or "harder-to-learn" features well, thus improving the model's generalization.
翻译:在本文中,我们研究了预培训标签细度如何影响图像分类任务中深度神经网络的泛化。我们专注于“细到粗”的迁移学习设置,其中预培训标签比目标问题更细粒度。我们使用 iNaturalist 2021 的标签层次结构进行实验,并观察到相对于基线,误差率的相对改进为8.76%。我们发现以下几个条件是实现改进的关键:1)预培训数据集具有强大且有意义的标签层次结构,2)其标签函数与目标任务的标签函数强烈对齐,最重要的是,3)选择适当的预培训标签细度。预培训标签细度的重要性在我们在 ImageNet 上的迁移学习实验中得到证实。特别是,我们展示了在 ImageNet21k 的叶子标签上进行预训练会产生比在其他更粗粒度的预培训标签上进行预训练更好的迁移结果,这支持常见做法。从理论上讲,通过对两层卷积 ReLU 网络的分析,我们证明:1)只在粗粒度标签上训练的模型只对常见或“易学”的特征作出强烈反应;2)在满足正确条件的数据集上,细粒度预训练还鼓励模型学习更罕见或“更难学”的特征,从而改善模型的泛化能力。