Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.
翻译:安全是自主系统的一个关键组成部分,仍然是现实世界中需要利用的学习政策的挑战。特别是,利用强化学习学习所学的政策往往由于不安全行为而不能概括到新的环境。我们建议Sim-to-Lab-Real安全地缩小现实差距。为了改善安全,我们采用双重政策设置,利用累积任务奖励培训业绩政策,并通过根据汉密尔顿-Jacobbi的可达性分析解决距离遥远的贝尔曼计算法来培训备份(安全)政策。在Sim-Lab传输中,我们应用监督控制计划来保护探索期间不安全的行动;在实验室到Real传输中,我们利用可能大致正确(PAC)-Baye的框架,为在不见环境中政策的预期性能和安全提供较低的限制。我们实证地研究了在两类室内环境进行自视导航的拟议框架,包括一个摄影现实化框架。我们还展示了在真正室内空间进行硬件实验时,用四重机器人进行强的普及性表现。见https://realite-site/timglegleas-tal/taltoimtototototoal.comstal.commentamentment expalment exment ex.compalment expalment expalment expalment expalment expalmentalmentalmentalmentalpalmentalmental.compalmental.compalmentalmentalmentalmentalmentalmentalmentmentmentmentalmentalmentalmentalmentalmentalmentalmentalmentmentmentmentmentalmentalmentalmentalmentmentmentmental.compmental.comp ex.compmentalmentalmentalmentalmentmentmentalmentalmentalmentmentalmental./s.