In this work, we first formulate the problem of goal-conditioned robotic water scooping with reinforcement learning. This task is challenging due to the complex dynamics of fluid and multi-modal goal-reaching. The policy is required to achieve both position goals and water amount goals, which leads to a large convoluted goal state space. To address these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum through the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently generalize to noisy real-robot water-scooping scenarios with different physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
翻译:在这项工作中,我们首先提出有目标的机器人集水问题,并进行强化学习。由于流体和多模式目标的复杂动态,这项任务具有挑战性。该政策是实现定位目标和水量目标的复杂动态,从而导致一个巨大的混乱目标状态空间。为了应对这些挑战,我们引入了“目标采样以疏漏为适应”课程强化学习方法(GOATS),这是一个课程强化学习方法,可以学习一个有效和普遍适用的机器人挖水任务政策。具体地说,我们使用有目标因素的奖赏配方,以及跨极定位目标分布和数量分布,以通过学习过程创建课程。因此,我们提出的方法可以在模拟中超越基线,并实现5.46 %和8.71%的水平差差,从而导致一个巨大的目标状态。为了应对这些挑战,我们引入了“储箱”和“大型目标状态”初始水状况的1 000变差,一个课程强化学习方法。除了在模拟环境中有效外,我们的方法还可以高效地概括到与不同物理配置和视觉环境的噪音真实机器人水调情景,展示高效益和一般可操作性。</s>