Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain. This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints. We refer to this as constrained model misspecification. We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL) suite.
翻译:此外,现实世界的系统往往受到非静止、磨损、未经校准的传感器等效应的影响。这些效应实际上干扰了系统动态,并可能导致一个领域受过培训的政策在被安装到同一领域受扰动的版本时表现不佳。这可能会影响一个政策在未来获得最大回报的能力以及它在多大程度上能满足各种制约。我们将此称为有限的模型区分错误。我们提出了一种算法,可以减少这种类型的分类,并在真实世界强化学习(RWRRRL)的多部模拟的Mujoco任务中展示其表现。