As Deep Learning continues to drive a variety of applications in datacenters and HPC, there is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. In particular, we focus on the problem of mapping layers from several DNNs simultaneously on an accelerator. Given the extremely large search space, we formulate the search as an optimization problem and develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration. We quantitatively compare G# with several common heuristics, state-of-the-art optimization methods, and reinforcement learning methods across different accelerator set-tings (large/small accelerators) and different sub-accelerator configurations (homogeneous/heterogeneous), and observeG# can consistently find better solutions. Further, to enable real-time scheduling, we also demonstrate a method to generalize the learnt schedules and transfer them to the next batch of jobs, reducing schedule compute time to near zero.
翻译:随着深学习继续推动在数据中心和高常委会中的各种应用,正在逐步形成一种趋势,即以若干次加速核心/芯片建立大型加速器。这项工作着眼于支持这些加速器的多重加速度问题。特别是,我们侧重于在加速器上同时绘制几个DNN的层层的问题。鉴于搜索空间非常大,我们将搜索设计成一个优化问题,并开发一个名为G#的专门遗传算法,由客户操作员来进行结构化的样本高效探索。我们量化地将G#与若干共同的超常、最先进的优化方法进行比较,并加强不同加速器设置(大型/小型加速器)和不同亚加速器配置(多源/异种)的学习方法。观察G#可以始终找到更好的解决方案。此外,为了能够实时安排,我们还演示了一种方法,将所学过的时间表推广,并将其转移到下一批工作岗位,将时间缩短到接近零。