System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
翻译:系统识别,也称为学习前方模型、转移功能、系统动态等,在不同领域的科学和工程领域都有悠久的传统。特别是,这是加强学习研究中反复出现的一个主题,在加强学习研究中,先期模型通过学习从当前状态和行动到下一个状态的绘图功能,接近马尔科夫决策进程的国家过渡功能。这个问题通常被直接定义为受监督的学习问题。由于动态的内在复杂性,例如延迟效应、高非线性、不静止性、部分可观察性以及更重要的是,在使用前哨预测(基于过去预测的预测的预测的前提)时,在较长的时空中积累错误。我们在这里探讨如何利用加强学习来解决这一问题。我们阐述了为什么和如何使这一问题自然和声音适合作为强化学习问题,并提出一些实验结果,以证明“强化”是解决这类问题的有希望的技术。