Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.
翻译:许多实际世界离线强化学习(RL)问题涉及时空的连续环境,这种环境具有两个不同的特征:第一,状态x(t)在不定期的时间间隔内观测,第二,目前a(t)只影响未来状态x(t +g),延迟时间不明(g) > 0。这种环境的一个典型例子是卫星控制,地球和卫星之间的通信联系导致不规则观测和延误。现有的离线RL算法在时间或已知延误不定期观测状态的环境中取得了成功。然而,涉及不定期观测和未知延迟的环境仍是一个开放和具有挑战性的问题。为此,我们提议采用Neural Laplace 控制这一基于连续时间模型的离线RL 方法,将Neuralplace动态模型与模型预测控制(MPC)规划员(规划员)结合起来,并且能够从具有内在不为人所知的持续拖延的环境中以不固定的时间间隔的离线外数据采集。我们实验性地展示了持续延迟的环境,它能够实现接近专家的政策性业绩。</s>