LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.
翻译:基于大型语言模型(LLM)的智能体正被部署于安全关键型应用中,然而当前的护栏系统无法防止其违反时序安全策略——这些策略规定了智能体动作的次序与序列。例如,智能体可能在用户认证前访问敏感数据,或向未授权的支付方式处理退款,此类违规行为需要对动作序列而非单个动作进行推理。现有护栏依赖于不精确的自然语言指令或事后监控,无法提供智能体满足时序约束的形式化保证。本文提出Agent-C,一种新颖的框架,通过运行时保证确保LLM智能体遵守形式化的时序安全属性。Agent-C引入了一种用于表达时序属性(例如“访问数据前需完成认证”)的领域特定语言,将规约转换为一级逻辑,并利用SMT求解在令牌生成过程中检测不符合规约的智能体动作。当LLM试图生成不符合规约的工具调用时,Agent-C利用约束生成技术确保LLM生成的每个动作均符合规约,并为不符合规约的智能体动作生成合规的替代方案。我们在两个实际应用场景(零售客服与航空票务系统)及多种语言模型(开源与闭源)上评估Agent-C。实验结果表明,相较于最先进的护栏系统与无约束智能体,Agent-C在实现完美安全性(100%符合率,0%危害率)的同时提升了任务效用。在顶尖闭源模型上,Agent-C将符合率从77.4%提升至100%(Claude Sonnet 4.5)和从83.7%提升至100%(GPT-5),同时将效用分别从71.8%提升至75.2%和从66.1%提升至70.6%,这标志着可靠智能体推理达到了新的前沿水平。