Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.


翻译:由视觉语言模型(VLMs)驱动的具身智能体在执行复杂现实世界任务的能力日益增强,但仍易受危险指令影响而触发不安全行为。运行时安全护栏能在任务执行过程中拦截危险动作,因其灵活性而成为一种有前景的解决方案。然而,现有防御方法多依赖静态规则过滤器或提示级控制,难以应对动态、时序依赖且上下文丰富的环境中产生的隐含风险。为此,我们提出RoboSafe——一种基于可执行谓词安全逻辑的混合推理运行时安全机制。RoboSafe在混合长短时安全记忆上集成两个互补的推理过程:首先提出后向反思推理模块,持续回溯短期记忆中的近期轨迹以推断时序安全谓词,并在检测到违规时主动触发重规划;随后提出前向预测推理模块,通过从长期安全记忆与智能体多模态观测中生成上下文感知的安全谓词来预判潜在风险。这些组件共同构成一种自适应、可验证的安全逻辑,兼具代码可执行性与可解释性。在多智能体上的大量实验表明,相较于主流基线方法,RoboSafe显著减少了危险动作(风险发生率降低36.8%),同时保持近乎原始的任务性能。在实体机械臂上的真实场景评估进一步验证了其实用性。代码将在论文录用后开源。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员