One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.
翻译:深入强化学习以控制控制的主要挑战之一是需要广泛培训以学习该政策。 我们为此提出“控制测试深Q网络算法”(CT-DQN)的设计,即“深强化学习算法”,该算法利用一个控制导师,即外源控制法,减少学习时间。在设计该导师时,可以使用一种系统的近似模型,而不必假设系统动态的知识。如果使用独立,则无法预期它能够实现控制目标。在学习期间,导师有时建议采取行动,从而部分指导探索。我们验证了我们从OpenAI Gym(OpenAI Gym) 得出的三种设想方案:倒转式钟、月球着陆器和汽车赛车。我们证明,CT-DQN(CT-DQN)在典型功能近似解决方案方面能够实现更好或等效的数据效率。