Several approaches have been developed for answering users' specific questions about AI behavior and for assessing their core functionality in terms of primitive executable actions. However, the problem of summarizing an AI agent's broad capabilities for a user has received little research attention. This is aggravated by the fact that users may not know which questions to ask in order to understand the limits and capabilities of a system. This paper presents an algorithm for discovering from scratch the suite of high-level "capabilities" that an AI system with arbitrary internal planning algorithms/policies can perform. It computes conditions describing the applicability and effects of these capabilities in user-interpretable terms. Starting from a set of user-interpretable state properties, an AI agent, and a simulator that the agent can interact with, using arbitrary decision-making paradigms over primitive operations (unknown to the user), our algorithm returns a set of high-level capabilities with capability descriptions in the user's vocabulary. Empirical evaluation on several game-based scenarios shows that this approach efficiently learns interpretable descriptions of various types of AI agents in deterministic, fully observable settings. User studies show that such interpretable descriptions are easier to understand and reason with than the agent's primitive actions.
翻译:为了回答用户关于AI行为的具体问题,并评估其核心功能的原始可执行行动,已经制定了几种方法。然而,总结AI代理商对用户的广泛能力的问题很少引起研究关注。由于用户可能不知道需要问什么问题才能理解系统的局限性和能力,因此问题更为严重。本文件提供了一种从零开始发现具有任意内部规划算法/政策的AI系统能够实施的高级“能力”的一套“能力”的算法。它计算了描述这些能力在用户可解释术语中的适用性和效果的条件。从一套用户可解释的国家属性、AI代理商和模拟器开始,该代理商可以使用对原始操作(用户未知的)的任意决策模式进行互动。我们的算法返回一套高水平能力,在用户词汇中进行能力描述。对一些基于游戏的假设进行有根据的预测表明,这种方法有效地学习了在确定性、完全可观测的环境中对AI代理商的各种类型的可解释性描述。用户研究显示,这种解释性说明比代理商更简单。用户研究显示,这种解释性说明比原始性解释性说明更容易。