Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems. Popular training pipelines that use these frameworks for deep learning typically focus on (un-)supervised learning. How to best train reinforcement learning (RL) agents at scale is still an active research area. In this report we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way. Specifically we describe two architectures designed to make the best use of the resources available on a TPU Pod (a special configuration in a Google data center that features multiple TPU devices connected to each other by extremely low latency communication channels).
 翻译:支持最先进的AI研究需要平衡快速原型设计、方便使用和快速复制,同时有能力在传统上与生产系统相关的规模上部署实验。TensorFlow、PyTorrch和JAX等深入的学习框架允许用户透明地使用加速器,如TPU和GPU,卸载现代深层学习系统中更精密的计算密集部分培训和推断。使用这些框架进行深层学习的大众培训管道通常侧重于(非)监督学习。如何在规模上最佳地培训强化学习(RL)剂仍是一个活跃的研究领域。我们在本报告中指出,TPU特别适合以可扩展、高效和可复制的方式培训RL剂。我们具体地描述了两个旨在最佳利用TPUpod上现有资源的架构(谷歌数据中心的特殊配置,其内有通过极低低的通信渠道连接的多个TPU装置)。