This paper considers the problem of learning temporal task specifications, e.g. automata and temporal logic, from expert demonstrations. Task specifications are a class of sparse memory augmented rewards with explicit support for temporal and Boolean composition. Three features make learning temporal task specifications difficult: (1) the (countably) infinite number of tasks under consideration; (2) an a-priori ignorance of what memory is needed to encode the task; and (3) the discrete solution space - typically addressed by (brute force) enumeration. To overcome these hurdles, we propose Demonstration Informed Specification Search (DISS): a family of algorithms requiring only black box access to a maximum entropy planner and a task sampler from labeled examples. DISS then works by alternating between conjecturing labeled examples to make the provided demonstrations less surprising and sampling tasks consistent with the conjectured labeled examples. We provide a concrete implementation of DISS in the context of tasks described by Deterministic Finite Automata, and show that DISS is able to efficiently identify tasks from only one or two expert demonstrations.
翻译:本文审议了从专家演示中学习时间任务规格的问题,例如自动成份和时间逻辑。任务规格是一个稀有的记忆增加奖励的类别,明确支持时间和布尔的构成。三个特点使得学习时间任务规格变得困难:(1) (可计量) 所考虑的任务的无限数量;(2) 对任务编码需要多少记忆的优先无知;(3) 独立的解决方案空间----通常通过(粗力)查点处理。为克服这些障碍,我们提议示范性知情规格搜索:一种只要求黑盒访问最大诱杀剂规划师和标签示例中任务取样员的算法。然后,DIS在提供标注的示例和抽样任务之间进行交替,使所提供的演示不那么令人惊讶,而抽样任务与所标注的示例相一致。我们具体介绍了在确定性Finite Automata 所描述的任务中实施综合安全信息系统的情况,并表明,综合安全信息系统能够有效地识别仅来自一或两次专家演示的任务。