In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.
翻译:近年来,无人驾驶飞行器相关技术扩大了这方面的知识,揭示了需要解决办法的新问题和挑战。此外,由于该技术允许人们通常执行的流程自动化,工业部门的需求很大。这些飞行器的自动化在文献中得到了处理,采用了不同的机器学习战略。强化学习(RL)是一个自动化框架,经常用于培训自主代理。RL是一个机器学习模式,代理与解决某项任务的环境相互作用。然而,自主学习可能耗费时间,计算费用昂贵,在高度复杂的情况下可能不切实际。互动强化学习使外部培训师能够在一个代理商学习一项任务时向其提供咨询。在这项研究中,我们准备同时教授RL代理商使用奖分制和政策平整技术控制无人驾驶飞机。为培训提出了两种模拟情景:一种是没有障碍的,一种是障碍的。我们还研究了每种技术的影响。结果显示,受过培训的代理商在使用两种技术的同时获得的奖赏低于仅使用政策性分散方法培训的代理商。但是,在使用较低的政策分散方法时,实现了较低的代理商。