One of the several obstacles in the widespread use of AI systems is the lack of requirements of interpretability that can enable a layperson to ensure the safe and reliable behavior of such systems. We extend the analysis of an agent assessment module that lets an AI system execute high-level instruction sequences in simulators and answer the user queries about its execution of sequences of actions. We show that such a primitive query-response capability is sufficient to efficiently derive a user-interpretable causal model of the system in stationary, fully observable, and deterministic settings. We also introduce dynamic causal decision networks (DCDNs) that capture the causal structure of STRIPS-like domains. A comparative analysis of different classes of queries is also presented in terms of the computational requirements needed to answer them and the efforts required to evaluate their responses to learn the correct model.
翻译:在广泛使用AI系统方面存在的几个障碍之一是缺乏解释性要求,无法使外行人能够确保这类系统的安全和可靠行为。我们扩展了对代理评估模块的分析,该模块允许AI系统在模拟器中执行高层次的指令序列,并回答用户关于其执行行动序列的询问。我们表明,这种原始的查询反应能力足以有效地在固定、完全可观测和确定的环境中形成一个用户可解释的系统因果模型。我们还引入动态的因果决定网络(DCDNs),以捕捉涉与贸易有关的知识产权协议类似领域的因果结构。对不同类别查询的比较分析还体现在答复这些查询所需的计算要求以及评价其为学习正确模型而作出的反应所需的努力。