Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. In this work we explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. We then study the limits of a recent SOTA Deep RL approach when tested on a first grid-world environment from the upcoming SocialAI, a benchmark to assess the social skills of Deep RL agents. Videos and code are available at https://sites.google.com/view/socialai01 .
翻译:建立能够参与与人类社会互动的具有内涵的自主代理机构是大赦国际面临的主要挑战之一。这个问题促使人们就语言的使用进行许多研究。目前的方法侧重于语言作为非常简化和非多样化的社会环境中的交流工具:语言的“自然性”被降为高词汇大小和可变性的概念。在本文件中,我们认为,实现人类层面的自主性需要更广泛的一套关键社会技能:(1) 在复杂和多变的社会环境中使用语言;(2) 在语言之外,在不断变化的社会世界中多式环境中的复杂通信。在这项工作中,我们解释了认知科学的概念如何帮助大赦国际绘制实现人性智能的路线图,重点是其社会层面。然后,我们在从即将到来的SOTA深RL环境测试时,我们研究了最近的SOTA深RL方法的局限性,这是评估深RL代理者社会技能的一个基准。视频和代码见https://sites.gogle.com/view/socialai01。