A long-held objective in AI is to build systems that understand concepts in a humanlike way. Setting aside the difficulty of building such a system, even trying to evaluate one is a challenge, due to present-day AI's relative opacity and its proclivity for finding shortcut solutions. This is exacerbated by humans' tendency to anthropomorphize, assuming that a system that can recognize one instance of a concept must also understand other instances, as a human would. In this paper, we argue that understanding a concept requires the ability to use it in varied contexts. Accordingly, we propose systematic evaluations centered around concepts, by probing a system's ability to use a given concept in many different instantiations. We present case studies of such an evaluations on two domains -- RAVEN (inspired by Raven's Progressive Matrices) and the Abstraction and Reasoning Corpus (ARC) -- that have been used to develop and assess abstraction abilities in AI systems. Our concept-based approach to evaluation reveals information about AI systems that conventional test sets would have left hidden.
翻译:AI 的长期目标是建立以人性方式理解概念的系统。 撇开建立这样一个系统的困难, 即使是试图评估一个系统也是一项挑战, 因为今天AI相对的不透明性及其寻找捷径解决方案的倾向。 人类的人类形态化倾向加剧了这种困难, 假设一个能够承认一个概念实例的系统也必须像人类那样理解其他实例。 在本文中, 我们主张理解一个概念需要在不同情况下使用它的能力。 因此, 我们建议围绕概念进行系统评价, 通过在许多不同的瞬间探索一个系统使用一个特定概念的能力。 我们对两个领域 -- -- RAVEN(受Raven's ProgrementalMatrices的启发)和抽象与理性Corpus(ARC) -- -- 进行这种评价的个案研究, 它们是用来开发和评估AI系统中的抽象能力。 我们基于概念的评价方法揭示了常规测试组本会隐藏的关于AI系统的信息。