深强化学习大型批量模拟 (Large Batch Simulation for Deep Reinforcement Learning)

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of "batch simulation": accepting and executing large batches of requests simultaneously. Beyond exposing large amounts of work at once, batch simulation allows implementations to amortize in-memory storage of scene assets, rendering work, data loading, and synchronization costs across many simulation requests, dramatically improving the number of simulated agents per GPU and overall simulation throughput. To balance DNN inference and training costs with faster simulation, we also build a computationally efficient policy DNN that maintains high task performance, and modify training algorithms to maintain sample efficiency when training with large mini-batches. By combining batch simulation and DNN performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system using a 64-GPU cluster over three days. We provide open-source reference implementations of our batch 3D renderer and simulator to facilitate incorporation of these ideas into RL systems.

翻译：在视觉复杂的3D环境中,我们通过比先前工作高出两个级级,加快了在视觉复杂的3D环境中的深层强化学习培训,使先前工作达到两个级级,在单一的GPU上实现每秒19,000个经验框架的端到端培训速度,在单一的GPU上达到每秒19,在单一的8GPU机器上达到72,000个经验框架,在单一的8GPU机器上达到每秒72,000个经验框架。我们方法的关键理念是围绕“批量模拟”原则设计一个3D制成的转化器,并体现导航模拟器:同时接受和执行大量请求。除了一次性暴露大量的工作外,批量模拟还使得能够实施现场资产存储、提供工作、数据装装和同步费用,使许多模拟请求的每秒199,大大改进每秒的模拟剂数量。为了平衡DNNPN的推价和培训成本,我们还建立了一个计算高效的政策 DNNNN,在与大型小棒的培训中保持样本的效率。通过将批量模拟和DNNNP绩效优化,我们展示了3级导航代理在复杂的3D环境上培训的3D参考环境中,在1.5天前将一个GPU的1级集的1级集的系统上提供了我们1至97%的1级的精确度。