Causal World Causal World: 建筑结构和转移学习的机器人操纵基准 (CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning)

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

翻译：尽管最近强化学习取得了成功(RL),但对于代理人来说,将学习技能传授给相关环境仍然是一项挑战。为了便利研究解决这一问题,我们提议CausalWorld,这是在机器人操纵环境中因果关系结构和转让学习的基准。环境是一个开放源码机器人平台的模拟,因此提供了模拟到真实转移的可能性。任务包括根据儿童如何学会建立复杂结构,从一组特定的块块构建3D形状。CausalWorld的关键优势在于它提供了这种任务组合,具有共同因果关系结构和基本因素(例如机器人和对象群、颜色、大小)。用户(或代理人)可以对所有因果关系变量进行干预,从而可以精确地控制相似任务(或任务分布)的大小。因此,可以很容易地确定一个预期困难程度的培训和评价分布,针对的是某种具体的概括形式(例如,只是外观或目标质量的变化 ) 。此外,这种共同的平衡化有助于通过在最初和目标质量、颜色、颜色、颜色、颜色、颜色和大小的分布之间进行跨级的分类,从而可以精确地控制所有具体的标准,用户可以确定他们自己的课程,从而确定他们可以确定他们自己在最后的精确的排序上确定一个具体的基线任务,从而确定一个具体地标定标定标定任务。