We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and as an alternative to the standard of cost shaping. With access to a generative model, we develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality (through a reduction to a reachability problem). Empirically, our algorithm can achieve strong performance even in low-sample regimes.
翻译:我们用线性时间逻辑(LTL)限制来研究政策优化问题,LTL的语言允许灵活地描述可能非自然的任务,将任务编码为计算成本功能。我们认为受LTL限制的PO是一个系统框架,将任务规格与政策选择脱钩,并作为成本形成标准的一种替代。有了基因化模型,我们开发了一个基于模型的方法,通过抽样的复杂性分析,保证任务满意度和成本最佳性(通过降低到可达性问题 ) 。