Existing work in language grounding typically study single environments. How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG. Our shared architecture achieves comparable performance to environment-specific architectures. Moreover, we find that many recent modelling advances do not result in significant gains on environments other than the one they were designed for. This highlights the need for a multi-environment benchmark. Finally, the best models significantly underperform humans on SILG, which suggests ample room for future work. We hope SILG enables the community to quickly identify new methodologies for language grounding that generalize to a diverse set of environments and their associated challenges.
翻译:语言定位的现有工作通常是研究单一环境。 我们如何构建适用于多种环境的统一模型? 我们提议多环境符号互动语言定位基准(SILG),该基准将不同基础语言学习环境汇集在一个共同界面下。 SILG由需要向新的动态、实体和部分观察的世界(RTFM、使者、NetHack)推广的网格-世界环境组成,以及需要从复杂的场景(ALFWorld、触地)中解释丰富自然语言的视觉世界的象征性对应方(ALFWorld、触地)组成。所有这些环境在观测空间、行动空间、语言规格和计划复杂性的丰富性等方面提出了不同的地面挑战。此外,我们提议了这些环境的第一个共享模式架构,在这种环境中统一了多种基础学习环境,并评估了最近的进展,如以自我中心为中心的地方演进、经常性的跟踪、实体中心关注,以及利用SILG预先培训的LM。我们的共同架构取得了与环境特定结构(ALWorld World)相似的业绩。此外,我们发现,最近的许多不同模式的进展并没有给环境带来重大的进展,而并非它们所设计的空间所设计所设计的环境。