Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.
翻译:处理环境(例如过渡动态)和目标(例如奖励职能)中的不常态问题是一个具有挑战性的问题,对于在现实世界应用强化学习(强化学习)至关重要。 虽然大多数现行做法将变化作为单一共同嵌入矢量的模式,但我们利用最近因果文献的洞察力来模拟个别潜在变化因素和不同环境的因果图等不常态因素;特别是,我们提议采用因子适应非常态RL(FANS-RL)这一因子调整方法,既从因素型的MDP的因果结构方面学习,又从因子化的单个时间变化因素的体现中学习。我们证明,在标准假设下,我们可以完全恢复代表因素型过渡和奖励功能的因果图,以及个人变化因素和国家组成部分之间的部分结构。我们可以通过我们的总体框架,考虑具有不同功能类型和变化频率的一般非常态情景,包括不同时局的变化。 实验结果表明,FANS-RL在稳定度、不稳定度的现有不稳定度方面超越了当前稳定度。