We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space.
翻译:我们推出名为DeepTraffic的交通模拟,其中一组车辆的规划系统由一个神经网络处理,作为无模型的、政策外强化学习过程的一部分。 DeepTraffic的主要目标是让数千名学生、教育工作者和研究人员能够亲身学习深层强化学习,以便激励和推动通过大规模公开竞争对深Q学习网络变异和超光计配置的探索和评价。本文调查了在深塔菲克竞赛第一次迭代后产生的政策网络的众源超参数调整,当时有数千名参与者积极搜索了超光谱空间。