SABLAS: 黑箱动态系统学习安全控制 (SABLAS: Learning Safe Control for Black-box Dynamical Systems)

Control certificates based on barrier functions have been a powerful tool to generate probably safe control policies for dynamical systems. However, existing methods based on barrier certificates are normally for white-box systems with differentiable dynamics, which makes them inapplicable to many practical applications where the system is a black-box and cannot be accurately modeled. On the other side, model-free reinforcement learning (RL) methods for black-box systems suffer from lack of safety guarantees and low sampling efficiency. In this paper, we propose a novel method that can learn safe control policies and barrier certificates for black-box dynamical systems, without requiring for an accurate system model. Our method re-designs the loss function to back-propagate gradient to the control policy even when the black-box dynamical system is non-differentiable, and we show that the safety certificates hold on the black-box system. Empirical results in simulation show that our method can significantly improve the performance of the learned policies by achieving nearly 100% safety and goal reaching rates using much fewer training samples, compared to state-of-the-art black-box safe control methods. Our learned agents can also generalize to unseen scenarios while keeping the original performance. The source code can be found at https://github.com/Zengyi-Qin/bcbf.

翻译：以屏障功能为基础的控制证书是产生动态系统可能安全控制政策的有力工具。然而,基于屏障证书的现有方法通常是对具有不同动态的白箱系统采用的,这使得这些方法不适用于许多实际应用,而该系统是黑箱,无法精确建模。另一方面,黑箱系统的无型强化学习(RL)方法缺乏安全保障和低取样效率。在本文中,我们提出了一个新方法,可以学习黑箱动态系统的安全控制政策和障碍证书,而不需要精确的系统模型。我们的方法是将损失功能重新设计成向控制政策后方推进梯度,即使黑箱动态系统是无差异的,而且我们表明,安全证书仍保留在黑箱系统上。模拟实验结果显示,我们的方法可以大大改进学习政策的绩效,即实现近100%的安全和目标达标率,而不必使用精确的系统模型。我们所学的代理商也可以在保留原始源码/原始源码的同时,在原始源/原始源/原始码中找到。我们所学的代理人也可以将原始码/Z。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日