Recently, equivariant neural network models have been shown to improve sample efficiency for tasks in computer vision and reinforcement learning. This paper explores this idea in the context of on-robot policy learning in which a policy must be learned entirely on a physical robotic system without reference to a model, a simulator, or an offline dataset. We focus on applications of Equivariant SAC to robotic manipulation and explore a number of variations of the algorithm. Ultimately, we demonstrate the ability to learn several non-trivial manipulation tasks completely through on-robot experiences in less than an hour or two of wall clock time.
翻译:最近,已经证明,等式神经网络模型提高了计算机视觉和强化学习任务的抽样效率,本文件在机器人政策学习的背景下探讨这一想法,即必须在不参考模型、模拟器或离线数据集的情况下,完全学习物理机器人系统的政策。我们侧重于“等式 SAC”在机器人操作中的应用,并探索算法的若干变异。最终,我们展示了在不到一小时或两小时的钟钟钟内通过机器人体验完全学习若干非三角操纵任务的能力。