Deep neural networks have usually to be compressed and accelerated for their usage in low-power, e.g. mobile, devices. Recently, massively-parallel hardware accelerators were developed that offer high throughput and low latency at low power by utilizing in-memory computation. However, to exploit these benefits the computational graph of a neural network has to fit into the in-computation memory of these hardware systems that is usually rather limited in size. In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs. To this end, the graph is designed to contain loops by iteratively executing a single network building block. Furthermore, the trade-off between accuracy and latency of these so-called iterative neural networks is improved by adding multiple intermediate outputs both during training and inference. We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets that are especially demanding in terms of computational resources. In ablation studies, the improvement of network training by intermediate network outputs as well as the trade-off between weight sharing over iterations and the network size are investigated.
翻译:深神经网络通常需要压缩和加速,以便用于低功率,例如移动装置。最近,开发了大规模平行硬件加速器,通过使用模拟计算,以低功率提供高输送量和低悬浮;然而,为了利用这些好处,神经网络的计算图必须适应这些硬件系统的计算记忆,而这些硬件系统通常规模相当有限。在本研究中,我们引入了一组网络模型,这些模型的计算图的记忆足迹很小。为此,该图的设计目的是通过迭接执行一个单一网络建筑块来控制循环。此外,这些所谓的迭代神经网络的精确度和延缓度之间的权衡,通过在培训和推断期间增加多个中间输出而得到改善。我们在CamVid和Cityscovers数据集中展示了在计算资源方面特别需要的分辨率分解的状态结果。在计算资源方面,对中间网络产出的网络培训的改进是超重的,作为交易量和网络的共享。