与稳定不确定性时间系统模拟器不断深深学习 (Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time Systems)

Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system parameter vector. We can collect many experiences more efficiently than interactions with the real system. However, it is difficult to identify the system parameter vector accurately. If we have an identification error, experiences obtained by the simulator may degrade the performance of the learned policy. Thus, we propose a practical RL algorithm that consists of two stages. At the first stage, we choose multiple system parameter vectors. Then, we have a mathematical model for each system parameter vector, which is called a virtual system. We obtain optimal Q-functions for multiple virtual systems using the continuous deep Q-learning algorithm. At the second stage, we represent a Q-function for the real system by a linear approximated function whose basis functions are optimal Q-functions learned at the first stage. The agent learns the Q-function through interactions with the real system online. By numerical simulations, we show the usefulness of our proposed method.

翻译：将强化学习(RL)应用到实际系统的稳定问题中, 增强学习(RL)对于实际系统的稳定问题来说是有限的, 因为代理商需要许多经验才能学习最佳政策, 并且可能决定其探索过程中的危险行动。如果我们知道一个真实系统的数学模型, 模拟器是有用的, 因为它使用一个给定的系统参数矢量的数学模型来预示实际系统的行为。我们可以收集比与实际系统互动效率更高的许多经验。但是, 很难准确地识别系统参数矢量。如果我们有一个识别错误, 模拟器获得的经验可能会降低所学政策的性能。因此, 我们建议一种实用的 RL 算法, 包括两个阶段。在第一阶段, 我们选择多个系统参数矢量矢量的参数矢量。然后, 我们为每个系统设计了一个数学模型, 这个系统被称为虚拟系统。我们用连续的深层次的Q- 学习算法为多个虚拟系统获取最佳的功能。在第二阶段, 我们代表一个真实系统的功能的功能的功能是直线形的功能, 其基础功能是第一个阶段所学到的最佳的Q- 。代理商通过我们提议的模拟系统与真正的模拟方法来显示我们所提议的数字效用。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日