安全动态学习和控制:按顺序探索-探索-开发框架 (Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework)

To safely deploy learning-based systems in highly uncertain environments, one must ensure that they always satisfy constraints. In this work, we propose a practical and theoretically justified approach to maintaining safety in the presence of dynamics uncertainty. Our approach leverages Bayesian meta-learning with last-layer adaptation: the expressiveness of neural-network features trained offline, paired with efficient last-layer online adaptation, enables the derivation of tight confidence sets which contract around the true dynamics as the model adapts online. We exploit these confidence sets to plan trajectories that guarantee the safety of the system. Our approach handles problems with high dynamics uncertainty where reaching the goal safely is initially infeasible by first exploring to gather data and reduce uncertainty, before autonomously exploiting the acquired information to safely perform the task. Under reasonable assumptions, we prove that our framework provides safety guarantees in the form of a single joint chance constraint. Furthermore, we use this theoretical analysis to motivate regularization of the model to improve performance. We extensively demonstrate our approach in simulation and on hardware.

翻译：为了在高度不确定的环境中安全部署学习系统,人们必须保证这些系统始终能满足各种限制。在这项工作中,我们提出了在动态不确定的情况下维持安全的实用和理论上合理的方法。我们的方法利用了巴伊西亚的元学习来进行最后一级适应:通过在线外培训的神经-网络功能的清晰性,加上有效的最后一级在线适应性,使得能够产生紧凑的信心,这种信任在模型在网上调整时围绕真实动态而形成。我们利用这些信任来规划保证系统安全的轨迹。我们的方法处理高度动态不确定性的问题,在安全地实现目标之前,首先探索收集数据和减少不确定性,然后自主地利用获得的信息安全地执行任务。根据合理的假设,我们证明我们的框架以单一的联合机会制约的形式提供了安全保障。此外,我们利用这种理论分析来激励模型的正规化来提高绩效。我们广泛展示了我们在模拟和硬件方面的做法。

相关内容