Despite the increasing prevalence of deep neural networks, their applicability in resource-constrained devices is limited due to their computational load. While modern devices exhibit a high level of parallelism, real-time latency is still highly dependent on networks' depth. Although recent works show that below a certain depth, the width of shallower networks must grow exponentially, we presume that neural networks typically exceed this minimal depth to accelerate convergence and incrementally increase accuracy. This motivates us to transform pre-trained deep networks that already exploit such advantages into shallower forms. We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one. We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth. Finally, we use our method to provide more efficient alternatives to MobileNetV2 and EfficientNet-Lite architectures on the ImageNet classification task.
翻译:尽管深神经网络日益普遍,但由于计算负荷,它们在资源限制装置中的适用性有限。现代装置显示出高度平行,但实时潜伏仍高度依赖网络深度。尽管最近的工程显示,在一定深度下,浅线网络的宽度必须成倍增长,但我们假设,神经网络通常会超过这一最小深度,以加速趋同,并逐步提高准确性。这促使我们将已经利用这些优势的预先训练的深线网络转换为更浅的形式。我们提出了一个方法,以了解是否可以删除非线性启动,允许连续将线性层折叠成一个层。我们将我们的方法应用到预先在CIFAR-10和CIFAR-100上训练过的网络,发现它们都可以被转化成具有类似深度的更浅的网络形式。最后,我们用我们的方法为移动网络2和高效的网络-网络网络分类任务提供更有效的替代方法。