Autonomous robots require high degrees of cognitive and motoric intelligence to come into our everyday life. In non-structured environments and in the presence of uncertainties, such degrees of intelligence are not easy to obtain. Reinforcement learning algorithms have proven to be capable of solving complicated robotics tasks in an end-to-end fashion without any need for hand-crafted features or policies. Especially in the context of robotics, in which the cost of real-world data is usually extremely high, reinforcement learning solutions achieving high sample efficiency are needed. In this paper, we propose a framework combining the learning of a low-dimensional state representation, from high-dimensional observations coming from the robot's raw sensory readings, with the learning of the optimal policy, given the learned state representation. We evaluate our framework in the context of mobile robot navigation in the case of continuous state and action spaces. Moreover, we study the problem of transferring what learned in the simulated virtual environment to the real robot without further retraining using real-world data in the presence of visual and depth distractors, such as lighting changes and moving obstacles.
翻译:自主机器人需要高度的认知和机动智能才能进入我们的日常生活。在非结构化的环境中,在存在不确定因素的情况下,这种程度的智能不容易获得。强化学习算法证明能够以端到端的方式解决复杂的机器人任务,而不需要手工制作的特征或政策。特别是在机器人方面,真实世界数据的成本通常极高,因此需要强化学习解决方案,达到高样本效率。在本文中,我们提出了一个框架,将从机器人的原始感官读物的高维观察中学习低维度国家代表与学习最佳政策相结合。在连续状态和行动空间的情况下,我们从移动机器人导航的角度来评估我们的框架。此外,我们研究将模拟虚拟环境中所学的东西转移到真实机器人身上的问题,而无需在视觉和深度分心器面前使用真实世界数据进行进一步再培训,例如照明变化和移动障碍。