This paper develops a new approach for estimating an interpretable, relational model of a black-box autonomous agent that can plan and act. Our main contributions are a new paradigm for estimating such models using a minimal query interface with the agent, and a hierarchical querying algorithm that generates an interrogation policy for estimating the agent's internal model in a vocabulary provided by the user. Empirical evaluation of our approach shows that despite the intractable search space of possible agent models, our approach allows correct and scalable estimation of interpretable agent models for a wide class of black-box autonomous agents. Our results also show that this approach can use predicate classifiers to learn interpretable models of planning agents that represent states as images.
翻译:本文为估算一个可解释的、可关联的、能够规划和行动的黑盒自主剂模型开发了一种新的方法。 我们的主要贡献是一个新的模式,用于使用与该代理商的最低查询接口来估算这些模型,以及一个产生审讯政策的等级质询算法,用于在用户提供的词汇中估算该代理商的内部模型。 对我们方法的经验性评估表明,尽管可能的代理商模型搜索空间难以捉摸,但我们的方法允许准确和可缩放地估算大量黑盒自主剂的可解释剂模型。 我们的结果还表明,这一方法可以使用上游分类师来学习作为图像代表国家的规划剂可解释模型。