Over the recent years, a significant number of complex, deep neural networks have been developed for a variety of applications including speech and face recognition, computer vision in the areas of health-care, automatic translation, image classification, etc. Moreover, there is an increasing demand in deploying these networks in resource-constrained edge devices. As the computational demands of these models keep increasing, pushing to their limits the targeted devices, the constant development of new hardware systems tailored to those workloads has been observed. Since programmability of these diverse and complex platforms -- compounded by the rapid development of new DNN models -- is a major challenge, platform vendors have developed Machine Learning tailored SDKs to maximize the platform's performance. This work investigates the performance achieved on a number of modern commodity embedded platforms coupled with the vendors' provided software support when state-of-the-art DNN models from image classification, object detection and image segmentation are targeted. The work quantifies the relative latency gains of the particular embedded platforms and provides insights on the relationship between the required minimum batch size for achieving maximum throughput, concluding that modern embedded systems reach their maximum performance even for modest batch sizes when a modern state of the art DNN model is targeted. Overall, the presented results provide a guide for the expected performance for a number of state-of-the-art DNNs on popular embedded platforms across the image classification, detection and segmentation domains.
翻译:近年来,为多种应用开发了大量复杂和深层的神经网络,包括语音和面部识别、医疗保健领域的计算机视野、自动翻译、图像分类等等。此外,随着这些模型的计算需求不断增加,这些模型的计算需求不断增长,将目标装置推到极限,观察到针对这些工作量不断开发新的硬件系统。由于这些多样化和复杂的平台的可编化性 -- -- 加上新的DNN模型的迅速开发 -- -- 是一个重大挑战,平台供应商开发了机器学习定制SDK系统,以最大限度地提高平台的性能。这项工作调查了一些现代商品嵌入平台以及供应商提供的软件支持的性能,这些现代DNN模型来自图像分类、目标检测和图像分割,不断开发适合这些工作量的新硬件系统。这项工作将特定嵌入平台的相对耐久性收益量化,并深入了解实现最高品位所需的最低批量规模,结论是现代嵌入系统达到其最大性能,即使是用于最适度的SDKKSD,同时,在有目标水平的州级分类平台上提供预期业绩的DNNS。