We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of steps required to go from any state to another, with task-specific "aimers" that compute a target state to reach a given goal. This decomposition allows the sharing across tasks of a task-agnostic model of the quasi-metric that captures the environment's dynamics and can be learned in a dense and unsupervised manner. We achieve multiple-fold training speed-up compared to recently published methods on the standard bit-flip problem and in the MuJoCo robotic arm simulator.
翻译:我们引入了新的强化学习方法,将规划准计量(PQM)结合起来,估算从任何州到另一个州所需步骤的数量,并配以特定任务“目标状态”来计算目标状态达到特定目标。 这种分解可以让任务之间共享一个任务-不可知的准计量模型,该模型可以捕捉环境动态并以密集和不受监督的方式学习。 与最近公布的关于标准小滑动问题和MuJoCo机器人臂模拟器的方法相比,我们实现了多重培训加速。