Reinforcement Learning has been able to solve many complicated robotics tasks without any need for feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of a huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieve high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.
翻译:强化学习已经能够解决许多复杂的机器人任务,而不需要以端到端的方式进行地貌工程。然而,直接从感官输入中学习最佳政策,即观测,往往需要处理和储存大量数据。在机器人方面,真正的机器人硬件数据的成本通常非常高,因此需要达到高采样效率的解决方案。我们提出了一种方法,旨在从观测到低维状态空间的绘图。这种绘图是用未经监督的学习方法学习的,这种学习方式是利用形成的损失功能,将先前的环境和任务知识纳入其中。利用来自国家空间的样本,最佳政策是迅速而高效地学习的。我们在模拟环境中和在实际机器人上测试若干移动机器人导航任务的方法。