Learning by interaction is the key to skill acquisition for most living organisms, which is formally called Reinforcement Learning (RL). RL is efficient in finding optimal policies for endowing complex systems with sophisticated behavior. All paradigms of RL utilize a system model for finding the optimal policy. Modeling dynamics can be done by formulating a mathematical model or system identification. Dynamic models are usually exposed to aleatoric and epistemic uncertainties that can divert the model from the one acquired and cause the RL algorithm to exhibit erroneous behavior. Accordingly, the RL process sensitive to operating conditions and changes in model parameters and lose its generality. To address these problems, Intensive system identification for modeling purposes is needed for each system even if the model dynamics structure is the same, as the slight deviation in the model parameters can render the model useless in RL. The existence of an oracle that can adaptively predict the rest of the trajectory regardless of the uncertainties can help resolve the issue. The target of this work is to present a framework for facilitating the system identification of different instances of the same dynamics class by learning a probability distribution of the dynamics conditioned on observed data with variational inference and show its reliability in robustly solving different instances of control problems with the same model in model-based RL with maximum sample efficiency.
翻译:互动学习是大多数活生物体获得技能的关键,这被正式称为“强化学习”。RL在寻找优化复杂系统的最佳政策方面是效率很高的。RL的所有范式都使用一个系统模型来寻找最佳政策。模型动态可以通过制定数学模型或系统识别来进行。动态模型通常暴露在可转移模型与获得模型之间的偏差和感知性不确定性中,并导致RL算法出现错误行为。因此,RL过程敏感于操作条件和模型参数变化,并失去其普遍性。为了解决这些问题,需要为每个系统为建模目的进行强化系统识别,即使模型动态结构相同,因为模型参数的轻微偏差会使模型在RL中变得毫无用处。动态模型的存在能够适应性地预测轨道的其余部分,而不论不确定性如何,都有助于解决问题。这项工作的目标是提供一个框架,通过学习具有变化率的模型所观察到的动态条件的概率分布,并显示不同样本的可靠性。