Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures require specialized toolchains, there is a certain amount of additional effort for developers each time they want to make use of a new deep learning accelerator. Furthermore the flexibility of the device is bound to the architecture itself, as well as to the functionality of the runtime environment. In this paper we propose a toolflow using TensorFlow as frontend, thus offering developers the opportunity of using a familiar environment. On the backend we use an FPGA, which is addressable via an HSA runtime environment. In this way we are able to hide the complexity of controlling new hardware from the user, while at the same time maintaining a high amount of flexibility. This can be achieved by our HSA toolflow, since the hardware is not statically configured with the structure of the network. Instead, it can be dynamically reconfigured during runtime with the respective kernels executed by the network and simultaneously from other sources e.g. OpenCL/OpenMP.
翻译:今天,人工神经网络是推动机器学习进展的主要创新者之一。这特别影响到神经网络加速硬件的开发。然而,由于大多数这些建筑都需要专门的工具链,因此每次开发者想要使用新的深层学习加速器时,都会作出一定的额外努力。此外,该设备的灵活性与建筑本身以及运行时间环境的功能有关。在本文中,我们提议使用TensorFlow作为前端的工具流,为开发者提供使用熟悉环境的机会。在后端,我们使用FPGA, 可通过HSA运行时环境进行处理。这样,我们就能向用户隐藏控制新硬件的复杂性,同时保持高度的灵活性。这可以通过我们的HSA工具流实现,因为硬件不是与网络结构同步配置的。相反,它可以在运行期间动态地与网络执行的各自内核以及同时从其他源e.g. Oplo/SOSMP.等来源进行重组。