This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.
翻译:本文介绍了一个安全意识学习框架,它采用适应性示范学习方法,同时对可能具有非静止剂动态的系统采用障碍证书。为了提取模型的动态结构,我们使用稀有的优化技术,由此产生的模型将与控制障碍证书结合使用,控制障碍证书只有在安全即将被破坏时才限制反馈控制器。根据一些温和假设,限制反馈控制器优化的解决方案保证为全球最佳,从而确保反馈控制器的单调改进。此外,我们重新配置(行动)价值函数近似,以使任何以内核为基础的非线性函数估计方法都适用。我们随后为(行动)价值函数近似使用一种状态的电离子调整过滤技术。所产生的框架在不为人所知且高度复杂的刷子上进行实验性核查。