Camera-based Deep Learning algorithms are increasingly needed for perception in Automated Driving systems. However, constraints from the automotive industry challenge the deployment of CNNs by imposing embedded systems with limited computational resources. In this paper, we propose an approach to embed a multi-task CNN network under such conditions on a commercial prototype platform, i.e. a low power System on Chip (SoC) processing four surround-view fisheye cameras at 10 FPS. The first focus is on designing an efficient and compact multi-task network architecture. Secondly, a pruning method is applied to compress the CNN, helping to reduce the runtime and memory usage by a factor of 2 without lowering the performances significantly. Finally, several embedded optimization techniques such as mixed-quantization format usage and efficient data transfers between different memory areas are proposed to ensure real-time execution and avoid bandwidth bottlenecks. The approach is evaluated on the hardware platform, considering embedded detection performances, runtime and memory bandwidth. Unlike most works from the literature that focus on classification task, we aim here to study the effect of pruning and quantization on a compact multi-task network with object detection, semantic segmentation and soiling detection tasks.
翻译:在自动驾驶系统中,人们越来越需要基于相机的深层学习算法来认识自动驾驶系统。然而,汽车业的制约因素通过将有限的计算资源强加在嵌入系统,对CNN的部署提出了挑战。在本文中,我们建议采用一种办法,在商业原型平台上在这种情况下嵌入多任务CNN网络,即,在10个FPS的芯片(SoC)低功率系统处理四部环形鱼眼照相机。第一个重点是设计一个高效和紧凑的多任务网络结构。第二,对压缩CNN应用了一种剪接方法,帮助将运行时间和记忆用量减少2倍,而不会显著降低性能。最后,我们提议采用几种嵌入式优化技术,例如混合定量格式使用和在不同记忆区之间有效数据传输,以确保实时执行和避免带宽瓶颈。在硬件平台上评价这一方法,考虑嵌入的探测性能、运行时间和记忆带宽度。与侧重于分类任务的文献中的大部分工作不同,我们在这里研究对紧凑多任务网络进行剪裁剪裁的效果。我们的目的是研究固定的多任务和磁段和土壤探测。