Two current methods used to train autonomous cars are reinforcement learning and imitation learning. This research develops a new learning methodology and systematic approach in both a simulated and a smaller real world environment by integrating supervised imitation learning into reinforcement learning to make the RL training data collection process more effective and efficient. By combining the two methods, the proposed research successfully leverages the advantages of both RL and IL methods. First, a real mini-scale robot car was assembled and trained on a 6 feet by 9 feet real world track using imitation learning. During the process, a handle controller was used to control the mini-scale robot car to drive on the track by imitating a human expert driver and manually recorded the actions using Microsoft AirSim's API. 331 accurate human-like reward training samples were able to be generated and collected. Then, an agent was trained in the Microsoft AirSim simulator using reinforcement learning for 6 hours with the initial 331 reward data inputted from imitation learning training. After a 6-hour training period, the mini-scale robot car was able to successfully drive full laps around the 6 feet by 9 feet track autonomously while the mini-scale robot car was unable to complete one full lap round the track even after 30 hour training pure RL training. With 80% less training time, the new methodology produced significantly more average rewards per hour. Thus, the new methodology was able to save a significant amount of training time and can be used to accelerate the adoption of RL in autonomous driving, which would help produce more efficient and better results in the long run when applied to real life scenarios. Key Words: Reinforcement Learning (RL), Imitation Learning (IL), Autonomous Driving, Human Driving Data, CNN
翻译:用于培训自主汽车的两种现行方法是强化学习和模仿学习。这项研究将受监督的模仿学习纳入强化学习,使RL培训数据收集过程更加有效和高效。通过将这两种方法相结合,拟议的研究成功地利用了RL和IL方法的优势。首先,利用模拟学习,将真正的小型机器人汽车组装起来,培训在6英尺9英尺实际世界轨道上。在这一过程中,一名自动控制器在模拟和较小的现实世界环境中开发一种新的学习方法和系统方法,将受监督的模仿学习纳入强化学习,将受监督的模拟模仿学习纳入强化学习过程,使受监督的模拟学习方法成为系统的一部分。 331个准确的类似人文的奖励培训样本得以生成和收集。随后,一名代理器利用了6小时的强化学习和模拟学习培训最初的331个奖励数据输入了6英尺实际世界轨道。 在6小时的培训期间,一个小型机器人汽车可以成功地在6英尺的轨道上行驶完整驾驶,甚至9英尺的人类键驱动力驱动器,并手工记录了使用微软A号的动作动作动作动作动作。 33 准确的模拟机器人模拟培训方法在每小时后制作了80次的新的学习方法。