Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.
翻译:远古神经网络是解决许多计算机视觉任务的标准,直到最近为止,当时以MLP为基础的结构的变异器已开始显示竞争性性能。这些结构通常具有大量重量,需要接受大规模数据集的培训;因此,这些结构不适合在低数据系统中使用。在这项工作中,我们提出了一个简单而有效的框架,用少量数据来改进一般化。我们用完全连接的(FC)层加强现代CNN,并显示这种建筑变化对低数据系统的巨大影响。我们进一步提出了在线联合知识蒸馏方法,以便在火车时间使用额外的FC层,但在测试期间避免使用。这使我们能够改进基于CNN的模型的普遍化,而不增加测试时间的重量数量。我们为大量网络主干线和若干关于监督学习和积极学习的标准数据集进行分类试验。我们的实验大大超越了没有完全连接的层的网络,在监督的设置中相对改进了16元的验证准确度,而没有在判断过程中增加任何额外的参数。