Reinforcement learning has been used to train policies that outperform even the best human players in various games. However, a large amount of data is needed to achieve good performance, which in turn requires building large-scale frameworks and simulators. In this paper, we study how large-scale reinforcement learning can be applied to autonomous driving, analyze how the resulting policies perform as the experiment size is scaled, and what the most important factors contributing to policy performance are. To do this, we first introduce a hardware-accelerated autonomous driving simulator, which allows us to efficiently collect experience from billions of agent steps. This simulator is paired with a large-scale, multi-GPU reinforcement learning framework. We demonstrate that simultaneous scaling of dataset size, model size, and agent steps trained provides increasingly strong driving policies in regard to collision, traffic rule violations, and progress. In particular, our best policy reduces the failure rate by 57% while improving progress by 23% compared to the current state-of-the-art machine learning policies for autonomous driving.
翻译:暂无翻译