从DNNs到GANs:审查用于深层学习的有效硬件结构 (From DNNs to GANs: Review of efficient hardware architectures for deep learning)

In recent times, the trend in very large scale integration (VLSI) industry is multi-dimensional, for example, reduction of energy consumption, occupancy of less space, precise result, less power dissipation, faster response. To meet these needs, the hardware architecture should be reliable and robust to these problems. Recently, neural network and deep learning has been started to impact the present research paradigm significantly which consists of parameters in the order of millions, nonlinear function for activation, convolutional operation for feature extraction, regression for classification, generative adversarial networks. These operations involve huge calculation and memory overhead. Presently available DSP processors are incapable of performing these operations and they mostly face the problems, for example, memory overhead, performance drop and compromised accuracy. Moreover, if a huge silicon area is powered to accelerate the operation using parallel computation, the ICs will be having significant chance of burning out due to the considerable generation of heat. Hence, novel dark silicon constraint is developed to reduce the heat dissipation without sacrificing the accuracy. Similarly, different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network. In this review, we illustrate the recent developments in hardware for accelerating the efficient implementation of deep learning networks with enhanced performance. The techniques investigated in this review are expected to direct future research challenges of hardware optimization for high-performance computations.

翻译：近些年来,大规模整合产业(VLSI)的趋势是多方面的,例如,能源消耗减少、空间占用减少、准确结果减少、耗电减少、耗电减少、反应更快。为满足这些需要,硬件结构应当可靠和有力地应对这些问题。最近,神经网络和深层学习开始对目前的研究模式产生重大影响,研究模式包括几百万分之几的参数,即启动的非线性功能、地物提取、分类回归、基因对抗网络的变异操作。这些操作涉及巨大的计算和记忆管理。现有的DSP处理器无法进行这些操作,而且它们大多面临各种问题,例如记忆管理、性能下降和准确性受损。此外,如果巨大的硅区域能够利用平行计算加速运行操作,ICs将有很大的机会因大量热量的生成而燃烧。因此,开发了新的黑暗硅制约,以在不牺牲准确性的情况下减少热耗电量。同样,对设计DSP处理程序的挑战也作了不同的调整,以适应快速性运行的DSP进程,例如记忆管理、性下降和降低精确性网络的精确性运行,从而加速进行这种不断升级的网络的硬件运行。