The Abstraction and Reasoning Corpus (ARC) is a challenging program induction dataset that was recently proposed by Chollet (2019). Here, we report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000). Although this subset of tasks contains considerable variation, our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 80% of tasks solved per participant, and with 65% of tasks being solved by more than 80% of participants. Additionally, we find interesting patterns of behavioral consistency and variability within the action sequences during the generation process, the natural language descriptions to describe the transformations for each task, and the errors people made. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution. Future modeling work could incorporate these findings, potentially by connecting the natural language descriptions we collected here to the underlying semantics of ARC.
翻译:Chollet(2019年)最近提议建立Chollet(2019年)的“抽象与理性”公司(ARC)是一个具有挑战性的程序上岗数据集。在这里,我们报告从人类的行为研究中收集的第一组结果,这些结果来自ARC(40个在1000年中有40个)的一组任务。虽然这一组任务包含相当大的差异,但我们的结果表明,人类能够推断出基础程序,并为一个新的测试输入示例生成正确的测试输出,平均80%的任务由参与者解决,65%的任务由80%以上的参与者解决。此外,我们发现,在生成过程的动作序列中,行为一致性和变异性模式很有意思,描述每项任务变化的自然语言描述,以及人们的错误。我们的研究结果表明,人们可以快速和可靠地确定完成正确解决方案的任务的相关特征和属性。未来的建模工作可以纳入这些结果,将我们在这里收集的自然语言描述与ARC的基本语义联系起来。