Self-driving vehicles must be able to act intelligently in diverse and difficult environments, marked by high-dimensional state spaces, a myriad of optimization objectives and complex behaviors. Traditionally, classical optimization and search techniques have been applied to the problem of self-driving; but they do not fully address operations in environments with high-dimensional states and complex behaviors. Recently, imitation learning has been proposed for the task of self-driving; but it is labor-intensive to obtain enough training data. Reinforcement learning has been proposed as a way to directly control the car, but this has safety and comfort concerns. We propose using model-free reinforcement learning for the trajectory planning stage of self-driving and show that this approach allows us to operate the car in a more safe, general and comfortable manner, required for the task of self driving.
翻译:自驾车辆必须能够在多样化和困难的环境中明智地行动,其特点是高度的状态空间、各种优化目标和复杂的行为。 传统上,典型的优化和搜索技术已经应用到自驾车问题上;但是它们并没有完全解决高度状态和复杂行为环境中的操作问题。 最近,为自行驾驶的任务提出了仿造学习建议;但获得足够的培训数据需要花费大量人力。 强化学习已被提议为直接控制汽车的一种方法,但有安全和舒适的担忧。 我们提议在自行驾驶的轨迹规划阶段使用无模型强化学习,并表明这种方法允许我们以更安全、普遍和舒适的方式驾驶汽车,这是自行驾驶任务所需要的。