混合专家:混合动态系统的合作与竞争性学习 (Nested Mixture of Experts: Cooperative and Competitive Learning of Hybrid Dynamical System)

Model-based reinforcement learning (MBRL) algorithms can attain significant sample efficiency but require an appropriate network structure to represent system dynamics. Current approaches include white-box modeling using analytic parameterizations and black-box modeling using deep neural networks. However, both can suffer from a bias-variance trade-off in the learning process, and neither provides a structured method for injecting domain knowledge into the network. As an alternative, gray-box modeling leverages prior knowledge in neural network training but only for simple systems. In this paper, we devise a nested mixture of experts (NMOE) for representing and learning hybrid dynamical systems. An NMOE combines both white-box and black-box models while optimizing bias-variance trade-off. Moreover, an NMOE provides a structured method for incorporating various types of prior knowledge by training the associative experts cooperatively or competitively. The prior knowledge includes information on robots' physical contacts with the environments as well as their kinematic and dynamic properties. In this paper, we demonstrate how to incorporate prior knowledge into our NMOE in various continuous control domains, including hybrid dynamical systems. We also show the effectiveness of our method in terms of data-efficiency, generalization to unseen data, and bias-variance trade-off. Finally, we evaluate our NMOE using an MBRL setup, where the model is integrated with a model-based controller and trained online.

翻译：以模型为基础的强化学习(MBRL)算法可以达到显著的样本效率,但需要适当的网络结构来代表系统动态。目前的方法包括利用分析参数和深神经网络的黑箱模型进行白箱模型,但两者都可能因学习过程中的偏差取舍而受到损害,也没有为将领域知识注入网络提供结构化的方法。作为一种替代办法,灰箱模型利用神经网络培训中的先前知识,但仅限于简单的系统。在本文中,我们设计了一个专家的嵌套组合(NMOE),用于代表并学习混合动态系统。NMOE结合了白箱和黑箱模型,同时优化了偏差取舍取舍。此外,NMOE提供了一个结构化的方法,通过合作或竞争性培训将各种先前的知识注入网络网络。作为替代,灰箱模型模型模型利用的是机器人与环境的物理接触以及其动态和动态特性。在本文中,我们展示了如何将先前的知识纳入我们的NMOE连续控制领域,包括混合的黑箱和黑箱模型,同时优化。此外,NMOE提供了一种结构的常规数据系统。我们最后使用一种综合的系统。我们所建立的在线数据效率。我们还展示了一种常规的系统。