Recently, deep learning has been an area of intense researching. However, as a kind of computing intensive task, deep learning highly relies on the the scale of the GPU memory, which is usually expensive and scarce. Although there are some extensive works have been proposed for dynamic GPU memory management, they are hard to be applied to systems with multitasking dynamic workloads, such as in-database machine learning system. In this paper, we demonstrated TENSILE, a method of managing GPU memory in tensor granularity to reduce the GPU memory peak, with taking the multitasking dynamic workloads into consideration. As far as we know, TENSILE is the first method which is designed to manage multiple workloads' GPU memory using. We implement TENSILE on our own deep learning framework, and evaluated its performance. The experiment results shows that our method can achieve less time overhead than prior works with more GPU memory saved.
翻译:最近,深层次的学习是一个密集的研究领域。然而,作为一种计算密集的任务,深层次的学习高度依赖于GPU记忆的规模,而GPU记忆通常昂贵和稀少。虽然为动态GPU记忆管理提出了一些广泛的工程,但很难应用于具有多任务动态工作量的系统,如数据库机器学习系统。在本文中,我们展示了TENSILE,这是一种在变压器颗粒中管理GPU记忆的方法,以降低GPU记忆高峰,同时将多任务动态工作量考虑在内。据我们所知,TENSILE是第一个设计用来管理多种工作量GPU记忆的方法。我们用自己的深层学习框架实施TENSILE,并评估其绩效。实验结果表明,我们的方法比以前的工作少花费时间,而节省的GPU记忆则更多。