The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPGAs. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited, we propose to shift the focus to using the varying difficulty of individual data samples to further improve efficiency and reduce average compute for classification. Input-dependent computation allows for the network to make runtime decisions to finish a task early if the result meets a confidence threshold. Early-Exit network architectures have become an increasingly popular way to implement such behaviour in software. We create: A Toolflow for Hardware Early-Exit Network Automation (ATHEENA), an automated FPGA toolflow that leverages the probability of samples exiting early from such networks to scale the resources allocated to different sections of the network. The toolflow uses the data-flow model of fpgaConvNet, extended to support Early-Exit networks as well as Design Space Exploration to optimize the generated streaming architecture hardware with the goal of increasing throughput/reducing area while maintaining accuracy. Experimental results on three different networks demonstrate a throughput increase of $2.00\times$ to $2.78\times$ compared to an optimized baseline network implementation with no early exits. Additionally, the toolflow can achieve a throughput matching the same baseline with as low as $46\%$ of the resources the baseline requires.
翻译:随着对深度神经网络准确性、吞吐量和效率不断提升的需求,已经出现了多种在FPGAs上使用自定义架构的方法。其中包括手工制作网络以及使用量化和剪枝来减少冗余的网络参数。然而,由于静态解决方案的潜力已经得到充分开发,我们建议将重点转向使用单个数据样本的不同难度来进一步提高效率并降低分类的平均计算量。输入相关的计算允许网络在满足置信度阈值的情况下做出运行时决策,提前完成任务。早期退出网络架构已成为在软件中实现此类行为的越来越流行的方法。我们创建了ATHEENA(硬件早期退出网络自动化的工具流),它是一种自动化FPGA工具流,利用早期退出网络中样本早期退出概率,对分配给不同部分的资源进行伸缩。该工具流使用fpgaConvNet的数据流模型,扩展以支持早期退出网络以及设计空间探索,以优化生成的流式架构硬件,以增加吞吐量/减少面积,同时保持准确性。在三种不同的网络上进行的实验结果表明,与优化的基线网络实现(没有早期退出)相比,吞吐量增加了 2.00倍 至2.78倍。此外,该工具流甚至可以利用基线所需资源的46%实现与相同基线匹配的吞吐量。