Deep Learning (DL) has transformed the automation of a wide range of industries and finds increasing ubiquity in society. The increasing complexity of DL models and its widespread adoption has led to the energy consumption doubling every 3-4 months. Currently, the relationship between DL model configuration and energy consumption is not well established. Current FLOPs and MACs based methods only consider the linear operations. In this paper, we develop a bottom-level Transistor Operations (TOs) method to expose the role of activation functions and neural network structure in energy consumption scaling with DL model configuration. TOs allows us uncovers the role played by non-linear operations (e.g. division/root operations performed by activation functions and batch normalisation). As such, our proposed TOs model provides developers with a hardware-agnostic index for how energy consumption scales with model settings. To validate our work, we analyse the TOs energy scaling of a feed-forward DNN model set and achieve a 98.2% - 99.97% precision in estimating its energy consumption. We believe this work can be extended to any DL model.
翻译:深度学习( DL) 改变了多种产业的自动化,发现社会上无处不在。 DL 模式的日益复杂及其广泛采用导致能源消耗每3-4个月翻一番。 目前, DL 模式配置和能源消耗之间的关系尚未完全确定。 以当前FLOP 和 MAC 为基础的方法只考虑线性操作。 在本文件中, 我们开发了一种底端晶体操作(TO) 方法, 以暴露 DL 模式配置下的激活功能和神经网络结构在能源消耗规模中的作用。 TOs 使我们能够发现非线性操作( 例如, 由激活功能和分批化完成的分解/ 根操作) 所发挥的作用。 因此, 我们提议的 TOs 模式为开发者提供了一种硬件- 氮化指数, 说明能源消费规模如何使用模型。 为了验证我们的工作, 我们分析了Fef-forward DNN 模型集成的TOS 能源规模, 并实现了98.2%- 99.97%的精确度估计其能源消耗量。 我们相信, 这项工作可以扩展到任何 DL 模式。