Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.
翻译:现有的深层学习系统侧重于优化和实施静态神经网络,这些静态神经网络将采用预先确定的模型结构和输入数据形状 -- -- 受动态神经网络的破坏。因此,采用深层学习系统的动态模型目前既不灵活,即使不是不可能,也是不尽人意的。优化动态神经网络比静态神经网络更具挑战性;优化必须考虑到所有可能的执行路径和振动形状。本文提议在多个平台上优化、编译和执行高性能和灵活的神经网络。Nimble通过引入动态类型系统、一套以动态为导向的优化和轻量级虚拟机运行时间来处理模型的活力。我们的评估表明,Nimble超越了最先进的先进深层学习框架和动态神经网络运行时间系统,在包括Intel CPUs、ARM CPUs和Nvidia GPUs在内的硬件平台上,最多可达20x的动态神经网络运行时间系统。