We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse, and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for each activity: definition (it can differ by time, place, or person), instantiation in a simulator, and evaluation. BEHAVIOR addresses these with three innovations. First, we propose an object-centric, predicate logic-based description language for expressing an activity's initial and goal conditions, enabling generation of diverse instances for any activity. Second, we identify the simulator-agnostic features required by an underlying environment to support BEHAVIOR, and demonstrate its realization in one such simulator. Third, we introduce a set of metrics to measure task progress and efficiency, absolute and relative to human demonstrators. We include 500 human demonstrations in virtual reality (VR) to serve as the human ground truth. Our experiments demonstrate that even state of the art embodied AI solutions struggle with the level of realism, diversity, and complexity imposed by the activities in our benchmark. We make BEHAVIOR publicly available at behavior.stanford.edu to facilitate and calibrate the development of new embodied AI solutions.
翻译:我们引入了BEHAVIOR,这是体现AI的100项模拟活动的基准,涵盖一系列日常家务劳动,如清洁、维修和食品准备等。这些活动的设计是现实的、多样的和复杂的,旨在复制代理人在现实世界中必须面对的挑战。建立这样一个基准对每项活动都构成三个基本困难:定义(可能因时间、地点或人而不同)、模拟器中的即时反应和评估。BEHAVIOR用三种创新来应对这些。首先,我们提出一种以对象为中心的、基于逻辑的上游描述语言,用以表达一项活动的初始和目标条件,使任何活动都能产生不同的实例。第二,我们确定一个基本环境所需的模拟性-不可知性特征,以支持BEHAVIOR,并在其中一种模拟器中展示其实现情况。第三,我们推出一套衡量任务进展和效率、绝对性和相对人类示威者的衡量标准。我们在虚拟现实(VR)中进行了500次人类演示,作为人类的地面事实。我们的实验表明,甚至以艺术形式体现了AI解决方案的特征,在现实、多样性和动态上,我们以新的形式,我们以新的形式,以新的形式,将自己的发展方式,使自己成为了现实主义的特征,成为了现实的规范。