We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and an alternative to the standard of cost shaping. With access to a generative model, we develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality (through a reduction to a reachability problem). Empirically, our algorithm can achieve strong performance even in low sample regimes.
翻译:我们用线性时间逻辑(LTL)限制来研究政策优化问题,LTL的语言允许灵活地描述可能非自然的任务,将任务编码为计算成本功能。我们认为受LTL限制的PO是一个系统框架,将任务规格与政策选择脱钩,并取代成本形成标准。有了基因化模型,我们开发了一个基于模型的方法,通过抽样复杂分析(通过降低到可达性问题 ) 来保证任务满意度和成本最佳性。 我们的算法即使在低抽样制度下也能取得强劲的绩效。