Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an important yet challenging task. In this paper, we consider a resilient DRL framework with observational interferences. Under this framework, we discuss the importance of the causal relation and propose a causal inference based DRL algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DRL environments with different types of interferences. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.
翻译:深入强化学习(DRL)在各种游戏模拟器和现实应用中表现出了令人印象深刻的成绩。但在实践中,DRL代理物可能会通过突然干扰(如停电、冷冻屏幕和对抗性扰动)得到错误的观察。如何针对这些稀有但任务关键和安全隐患的情景设计有弹性的DRL算法是一项重要而具有挑战性的任务。在本文件中,我们认为一个具有观测干扰力的有弹性的DRL框架。在这个框架内,我们讨论因果关系的重要性,并提出一个以因果推断为基础的DRL算法,称为因果推断Q网络(CIQ)。我们评估了几个基准DRL环境中具有不同干扰力的CIQ的性能。我们的实验结果表明,拟议的CIQ方法可以提高性能和抵御观察干扰的复原力。