Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated, and undisturbed environment. However, in real world applications, environments constantly change due to a variety of external events. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. We formalize this notion and discuss conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance. We derive results on the sample complexity of the algorithm and study its dependency on the extent of non-stationarity of the environment. We then conduct experiments to illustrate our results in a classic control environment.
翻译:暂无翻译