Transformers are becoming increasingly popular due to their superior performance over conventional convolutional neural networks(CNNs). However, transformers usually require a much larger amount of memory to train than CNNs, which prevents their application in many low resource settings. Local learning, which divides the network into several distinct modules and trains them individually, is a promising alternative to the end-to-end (E2E) training approach to reduce the amount of memory for training and to increase parallelism. This paper is the first to apply Local Learning on transformers for this purpose. The standard CNN-based local learning method, InfoPro [32], reconstructs the input images for each module in a CNN. However, reconstructing the entire image does not generalize well. In this paper, we propose a new mechanism for each local module, where instead of reconstructing the entire image, we reconstruct its input features, generated from previous modules. We evaluate our approach on 4 commonly used datasets and 3 commonly used decoder structures on Swin-Tiny. The experiments show that our approach outperforms InfoPro-Transformer, the InfoPro with Transfomer backbone we introduced, by at up to 0.58% on CIFAR-10, CIFAR-100, STL-10 and SVHN datasets, while using up to 12% less memory. Compared to the E2E approach, we require 36% less GPU memory when the network is divided into 2 modules and 45% less GPU memory when the network is divided into 4 modules.
翻译:由于在常规神经神经网络(CNN)上表现优异,变压器越来越受欢迎。然而,变压器通常需要比CNN更多的记忆量来培训CNN, 从而无法在许多低资源环境下应用。当地学习将网络分成几个不同的模块,并单独培训这些模块,是尾端至端(E2E)培训方法的一个很有希望的替代方法,以减少培训记忆量,并增加平行性。本文是第一个为此在变压器上应用本地学习的方法。基于CNN的标准本地学习方法,InfoPro [32],在CNN中重建每个模块的输入图像。然而,重建整个图像并不十分普遍。在本文件中,我们为每个本地模块建议一个新的机制,将网络分成几个不同的模块,而不是重建整个图像,我们从以前的模块重建其输入功能。我们评估了4个常用的数据集和3个常用的解码器结构。 实验显示,我们的方法超越了Inforforforfor deformormation, Pro 重建每个模块在CNN 中。但是,整个图像重建整个图像并不十分广泛。 在IMFAR% 2-10的网络中,我们需要将S-10的S-10的S-RAR数据上,在0.8比数据上,我们更低,我们需要将S-10的S-10的S-RLLLLL 需要,在0.5,在0.BN 时,在使用0.5到0.5,在使用时将S-10的服务器将S-10的模板在0.xxxxxxxxxxxxx为0.时,在0.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx为0.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx