Large-scale self-supervised models have recently revolutionized our ability to perform a variety of tasks within the vision and language domains. However, using such models for autonomous systems is challenging because of safety requirements: besides executing correct actions, an autonomous agent must also avoid the high cost and potentially fatal critical mistakes. Traditionally, self-supervised training mainly focuses on imitating previously observed behaviors, and the training demonstrations carry no notion of which behaviors should be explicitly avoided. In this work, we propose Control Barrier Transformer (ConBaT), an approach that learns safe behaviors from demonstrations in a self-supervised fashion. ConBaT is inspired by the concept of control barrier functions in control theory and uses a causal transformer that learns to predict safe robot actions autoregressively using a critic that requires minimal safety data labeling. During deployment, we employ a lightweight online optimization to find actions that ensure future states lie within the learned safe set. We apply our approach to different simulated control tasks and show that our method results in safer control policies compared to other classical and learning-based methods such as imitation learning, reinforcement learning, and model predictive control.
翻译:大型自我监督模式最近使我们在视觉和语言领域执行各种任务的能力发生了革命性的变化。然而,使用这种自主系统模式由于安全要求而具有挑战性:除了执行正确行动外,自主代理器还必须避免高成本和潜在的致命重大错误。传统上,自我监督培训主要侧重于模仿以往观察到的行为,而培训示范没有明确避免哪些行为的概念。在这项工作中,我们提议控制障碍变换器(ConBaT),这是一种从自我监督时的演示中学习安全行为的方法。 ConBaT受控制屏障功能理论概念的启发,并使用因果变换器,学会使用需要最低安全数据标签的批评器自动预测安全机器人行动。在部署期间,我们采用轻量的在线优化方法,以找到确保未来状态在所学到的安全套内的行动。我们对不同的模拟控制任务应用了我们的方法,并表明我们的方法与模仿学习、强化学习和模型预测控制等其他传统和基于学习的方法相比,在更安全的控制政策上产生了结果。</s>