Deep Neural Networks (DNNs) are the de facto algorithm for tackling cognitive tasks in real-world applications such as speech recognition and natural language processing. DNN inference comprises numerous dot product operations between inputs and weights that require numerous multiplications and memory accesses, which hinder their performance and energy consumption when evaluated in modern CPUs. In this work, we leverage the high degree of similarity between consecutive inputs in different DNN layers to improve the performance and energy efficiency of DNN inference on CPUs. To this end, we propose ReuseSense, a new hardware scheme that includes ReuseSensor, an engine to efficiently generate the compute and load instructions needed to evaluate a DNN layer accordingly when sensing similar inputs. By intelligently reusing previously computed product values, ReuseSense allows bypassing computations when encountering input values identical to previous ones. Additionally, it efficiently avoids redundant loads by skipping weight loads associated with the bypassed dot product computations. Our experiments show that ReuseSense achieves an 8x speedup in performance and a 74% reduction in total energy consumption across several DNNs on average over the baseline.
翻译:暂无翻译