DRL:为智能机器人控制进行深入强化学习 -- -- 概念、文学和未来 (DRL: Deep Reinforcement Learning for Intelligent Robot Control -- Concept, Literature, and Future)

Combination of machine learning (for generating machine intelligence), computer vision (for better environment perception), and robotic systems (for controlled environment interaction) motivates this work toward proposing a vision-based learning framework for intelligent robot control as the ultimate goal (vision-based learning robot). This work specifically introduces deep reinforcement learning as the the learning framework, a General-purpose framework for AI (AGI) meaning application-independent and platform-independent. In terms of robot control, this framework is proposing specifically a high-level control architecture independent of the low-level control, meaning these two required level of control can be developed separately from each other. In this aspect, the high-level control creates the required intelligence for the control of the platform using the recorded low-level controlling data from that same platform generated by a trainer. The recorded low-level controlling data is simply indicating the successful and failed experiences or sequences of experiments conducted by a trainer using the same robotic platform. The sequences of the recorded data are composed of observation data (input sensor), generated reward (feedback value) and action data (output controller). For experimental platform and experiments, vision sensors are used for perception of the environment, different kinematic controllers create the required motion commands based on the platform application, deep learning approaches generate the required intelligence, and finally reinforcement learning techniques incrementally improve the generated intelligence until the mission is accomplished by the robot.

翻译：机器学习(用于生成机器智能)、计算机视觉(用于更好的环境认知)和机器人系统(用于受控环境互动)的合并(用于机器学习的合并)、计算机视觉(用于更好的环境认知)和机器人系统(用于受控环境互动)的合并(用于推动这项工作,目的是提出智能机器人控制的基于愿景的学习框架,作为最终目标(基于视觉的学习机器人),这项工作特别引入了深度强化学习作为学习框架,即AI(AGI)的通用框架,即应用独立和平台独立的通用框架。在机器人控制方面,这一框架具体提议了一个独立于低级别控制的高级控制结构,这意味着这两种所需的控制水平可以彼此分开开发。在这方面,高层控制创造了使用由教练从同一平台产生的记录中的低级别控制数据控制平台所需的情报。记录中的低级别控制数据只是表明由使用同一机器人平台的培训员所进行的实验的成功和失败经验或顺序。所记录的数据序列由观测数据(输入传感器)、产生奖赏(反馈价值)和行动数据组成。对于实验平台和实验实验来说,需要的视觉传感器创造了必要的情报,使用该平台来控制平台的控制平台,最后通过学习强化环境,而需要的渐进式指令,最终生成的升级式的学习,直到以生成的升级式控制式指令。