We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to distribute tensors, based on which GSPMD will parallelize the computation. Its representation of partitioning is simple yet general, allowing it to express different or mixed paradigms of parallelism on a wide variety of models. GSPMD infers the partitioning for every operator based on limited user annotations, making it convenient to scale existing single-device programs. It solves several technical challenges for production usage, allowing GSPMD to achieve 50% to 62% compute utilization on up to 2048 Cloud TPUv3 cores for models with up to one trillion parameters.
翻译:我们提出普惠制MD(PSPMD),这是用于通用机器学习计算的一个自动、基于编译器的平行系统。它允许用户以与单一设备相同的方式写入程序,然后通过几个说明提示如何分配分母,而普惠制MD将以此为基础平行计算。它的分割代表简单而笼统,可以表达多种模式上不同或混合的平行模式。普惠制MD根据有限的用户说明推断每个运营商的分解,便于扩大现有单一设备程序的规模。它解决了生产使用方面的几个技术挑战,允许普惠制MD(PSPMD)实现50%至62%的计算率,最多为1万亿个参数的模型使用2048个云式TPUV3核心。