A key challenge on the path to developing agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We demonstrate the effectiveness of our automated NTT on a navigation task in a complex 3D environment. We investigate six classification models to shed light on the types of architectures best suited to this task, and validate them against data collected through a human NTT. Our best models achieve high accuracy when distinguishing true human and agent behavior. At the same time, we show that predicting finer-grained human assessment of agents' progress towards human-like behavior remains unsolved. Our work takes an important step towards agents that more effectively learn complex human-like behavior.
翻译:发展能学习复杂人类行为的代理人的道路上的一个关键挑战是需要迅速和准确地量化人类相似性。虽然人类对这种行为的评估可以非常准确,但速度和可缩放性有限。我们通过新颖的自动导航图象测试(ANTT)解决这些局限性,该测试学会预测人类相似性对人类的判断。我们展示了我们的自动NTT在复杂的三维环境中执行导航任务的有效性。我们调查了六种分类模型,以揭示最适合这项任务的建筑类型,并根据通过人类NTT收集的数据验证这些结构。我们的最佳模型在辨别真正的人类和代理人行为时达到了很高的准确性。同时,我们显示对代理人走向人类类似行为的进展的精细的人类评估仍未实现。我们的工作向更有效地学习复杂人类行为的代理人迈出了重要的一步。