Reinforcement learning has achieved remarkable performance in a wide range of tasks these days. Nevertheless, some unsolved problems limit its applications in real-world control. One of them is model misspecification, a situation where an agent is trained and deployed in environments with different transition dynamics. We propose an novel framework that utilize history trajectory and Partial Observable Markov Decision Process Modeling to deal with this dilemma. Additionally, we put forward an efficient adversarial attack method to assist robust training. Our experiments in four gym domains validate the effectiveness of our framework.
翻译:目前,强化学习在一系列广泛的任务中取得了显著成绩,然而,有些未解决的问题限制了其在现实世界控制中的应用,其中之一是模范的区分错误,一种是代理人在具有不同过渡动态的环境中接受培训和部署的情况。我们提出了一个新的框架,利用历史轨迹和部分可观测的Markov决定程序模型来应对这一困境。此外,我们提出了一个有效的对抗攻击方法来帮助开展强有力的培训。我们在四个体育领域的实验证实了我们框架的有效性。