Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.
翻译:许多动态过程,包括机器人控制和强化学习的共同情景(RL),涉及一系列互动子过程。尽管子过程不是独立的,但它们的相互作用往往很少,而且任何特定阶段的动态往往可以分解成当地独立的因果关系机制。这种局部因果关系结构可以用来提高序列预测和强化政策学习的抽样效率。我们通过引入当地因果模型(LCMs)来正式确定这一点,这些模型是按国家空间的某一组别从全球因果模型中产生的。我们建议一种方法,根据一个面向对象的国家代表来推断这些结构,以及反事实数据增强的新型算法(CODA)。 CoDA利用当地结构和经验重现产生反事实经验,这些在全球模型中具有因果关系。我们发现,CODA大大改进了当地因子化任务中RL代理的性能,包括分包和有目标限制的环境。