Learning to quickly control a complex dynamical system (continuous and non-linear) in the presence of disturbances and uncertainties is often desired in many industrial and robotic applications. However, techniques to accomplish this task usually rely on a mathematical system model, which is often insufficient to anticipate the effects of time-varying and interrelated sources of non-linearities. Furthermore, most model-free approaches that have been successful at this task rely on massive interactions with the system (usually in a simulation) and are trained in specialized hardware to fit a highly parameterized controller. In this work, we learn to control one such dynamical system (steering position control of a DC motor) using the sample efficient technique named Neural fitted Q. Using data collected from hardware interactions in the real world, we additionally build a simulator to experiment with a wide range of parameters and learning strategies. Using the parameters found in simulation, we successfully learn an effective control policy in one minute and 53 seconds on a simulation and in 10 minutes and 35 seconds on a physical system.
翻译:在许多工业和机器人应用中,往往需要技术来完成这一任务,但通常依靠数学系统模型,而数学系统模型往往不足以预测非线性时间变化和相互关联来源的影响。此外,在这项任务中取得成功的大多数不使用模型的方法都依赖于与系统的大量互动(通常是模拟),并经过专门硬件培训,以适合高度参数化控制器。在这项工作中,我们学会了使用名为Neural的样本高效技术控制一个这样的动态系统(DC马达的定位控制器)。我们利用从现实世界硬件互动中收集的数据,另外还建立了模拟器,以试验范围广泛的参数和学习战略。我们利用在模拟中发现的参数,在模拟中成功地学习了1分53秒的有效控制政策,在模拟中学习了10分35秒的物理系统。