Serverless computing has made it easier than ever to deploy applications over scalable cloud resources, all the while driving higher utilization for cloud providers. While this technique has worked well for easily divisible resources like CPU and local DRAM, it has struggled to incorporate more expensive and monolithic resources like GPUs or other application accelerators. We cannot simply slap a GPU on a FaaS platform and expect to keep all the benefits serverless promises. We need a more tailored approach if we want to best utilize these critical resources. In this paper we present Kernel-as-a-Service (KaaS), a serverless interface to GPUs. In KaaS, GPUs are first-class citizens that are invoked just like any other serverless function. Rather than mixing host and GPU code as is typically done, KaaS runs graphs of GPU-only code while host code is run on traditional functions. The KaaS system is responsible for managing GPU memory and schedules user kernels across the entire pool of available GPUs rather than relying on static allocations. This approach allows us to more effectively share expensive GPU resources, especially in multitenant environments like the cloud. We add support for KaaS to the Ray distributed computing framework and evaluate it with workloads including a TVM-based deep learning compiler and a BLAS library. Our results show that KaaS is able to drive up to 50x higher throughput and 16x lower latency when GPU resources are contended.
翻译:无服务器计算使得在可缩放的云源资源上部署应用程序比以往任何时候更容易,所有这一切同时为云源提供者带来更高的利用率。虽然这一技术对于CPU和本地 DRAM 等容易分散的资源效果良好,但对于像 CPU 和本地 DRAM 这样的易变资源而言,它却努力将更昂贵和单一的资源,例如 GPU 或其他应用程序加速器。 我们不能简单地在 FaaS 平台上打一个 GPU, 并期望保持所有无效益服务器的承诺。 如果我们想要最好地利用这些关键资源, 我们需要一种更有针对性的方法。 在本文中, 我们展示了 Kenel- as- a- Service (KaAS), 这是GPUS 的无服务器界面界面界面。 在 KaaS 中, GPUPS 与通常使用的代码混合, 而 KaaS 运行只使用 GPU 代码的图形, 而同时运行传统功能。 我们需要管理 GPU 的存储存储存储存储器存储器, 并计划整个 GPUS 的用户核心库, 而不是依赖静态的配置。 KAUS 系统允许我们共享的游戏环境, 。