Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by preserving information from the input image through interlaced auto-encoders (AEs), and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on five collections, i.e., MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and Caltech-256; where the SIRe-extended architectures achieve significantly increased performances across all models and datasets, thus confirming the presented approach effectiveness.
翻译:改善现有的神经网络结构可能涉及若干设计选择,例如操纵损失功能、采用不同的学习战略、利用培训时的梯度演化、优化网络超参数或提高结构深度,后者是一个直接的解决办法,因为它直接增强了网络的代表性能力;然而,由于深度的提高,通常导致众所周知的消失梯度问题。在本文件中,从解决这一问题的不同方法中借款,我们采用了一个相互交织的多任务学习战略,定义了SIRE,以减少与物体分类任务有关的渐变。提出的方法通过相互连接的自动编码器(AEs)从输入图像中保留信息,直接改进了神经网络,并通过跳动和剩余连接进一步完善了基础网络结构。为了验证所提出的方法,通过SIRe战略扩展了简单的CNN和有名网络的各种实施方式,并广泛测试了5种收集方法,即:MNIST、Fashion-MNIST、CIFAR-10、CIFAR-100和Caltech 256, 所介绍的方法直接改进了从输入图像图像中获取的信息,从而确认了所展示的所有SIS-RE-S-Setsset as asional asional as asional acals pappressals pappressionals pappressionals pals pals.