Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks.
翻译:高能力深层学习模型在培训时往往容易出现高度普遍化差距,因为培训时使用有限的标签培训数据。最近采用了一系列方法来解决这一问题。最近,我们采用各种方法,通过混合一对(或更多)培训样本来建立新的培训样本。我们建议采用PatchUp,这是州一级对革命神经网络(Cultural Neal Nets)的一种隐蔽的区块规范化技术,用于随机样本中某些毗连的地貌图区块。我们的方法提高了CNN模型的稳健性,以对付其他最先进的混合方法中可能出现的多重入侵问题。此外,由于我们正在将隐蔽空间中具有比输入空间更多维度的地块混在一起,我们获得了更多样化的样本,用于不同层面的培训。我们在CIFAR10/100、SVHN、Tiny-ImageNet和图像网络上进行的实验,利用ResNet结构,包括Precal-Resnet18/34、WRN-28-10、ResNet101/152模型显示PatchUpping在其他最先进的混合方法下或同等地显示,目前最先进的常规的固定的固定的固定的固定的对CNICC攻击的模型可以提供更牢固的模型。