如何就消费者设备取得实时人工智能? 可编程和定制建筑的解决方案 (How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures)

The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.

翻译：深层神经网络(DNNS)的空前表现导致各种人工智能(人工智能)推介任务(如对象和语音识别等)取得巨大进步。然而,在商品设备中部署此类AI模型面临重大挑战:计算成本高、绩效目标多、硬件差异多、对高精度的共同需求高,对在野外各种嵌入和移动设备中部署DNS造成严重问题。因此,我们尚未看到消费者设备中最先进的深层次学习算法的主流使用。在本文中,我们通过为高效的AI系统提供一系列设计技术,为这个可能改变游戏的交叉问题提供了初步答案。我们首先在针对可编程处理器和定制加速器时检查主要路障。然后,我们提出了在各种嵌入式和移动设备中部署D的实时应用的多种方法。这些模型、系统和硬件级技术及其组合,我们的结论提供了不负担过重移动硬件的AI系统示例。我们通过展示这些系统如何能更精确地精确地反映最新解决方案的准确性。此外,我们首先检查主要障碍,然后是将ASICA系统作为最佳的系统,这些核心应用系统如何使ASICA成为最佳的升级。