There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer. This framework has been used to evaluate human-machine creative collaborations across story and music generation, interactive block building, and exploration of molecular mechanisms in cancer. These activities are fundamentally different from the more constrained tasks performed by most contemporary personal assistants as they are generally open-ended, with no single correct solution, and often no obvious completion criteria. We identified the Key Properties that must be exhibited by successful systems. From there we identified "Hallmarks" of success -- capabilities and features that evaluators can observe that would be indicative of progress toward achieving a Key Property. In addition to being a framework for assessment, the Key Properties and Hallmarks are intended to serve as goals in guiding research direction.
翻译:人们越来越希望建立能够有效交流的计算机系统,以便在复杂、开放的活动上与人类合作。评估这些系统提出了重大挑战。我们描述一个框架,用于评价在不限成员名额的复杂情况下,评价人员没有将业绩与单一的正确答案进行比较的奢侈条件的系统。这个框架被用来评价在故事和音乐制作、交互式构件建设和癌症分子机制探索方面的人机创造性协作。这些活动与当代大多数个人助理执行的较为困难的任务有根本的不同,因为这些工作一般是开放式的,没有单一的正确解决办法,而且往往没有明显的完成标准。我们确定了成功系统必须展示的关键特征。我们从那里确定了成功标志 -- -- 评估人员可以观察到的这些特征和特点,它们将表明在实现关键属性方面取得的进展。除了作为评估框架外,关键属性和霍尔标志还打算作为指导研究方向的目标。