Programs, consisting of semantic and structural information, play an important role in the communication between humans and agents. Towards learning general program executors to unify perception, reasoning, and decision making, we formulate program-guided tasks which require learning to execute a given program on the observed task specification. Furthermore, we propose the Program-guided Transformer (ProTo), which integrates both semantic and structural guidance of a program by leveraging cross-attention and masked self-attention to pass messages between the specification and routines in the program. ProTo executes a program in a learned latent space and enjoys stronger representation ability than previous neural-symbolic approaches. We demonstrate that ProTo significantly outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets. Additionally, ProTo demonstrates better generalization to unseen, complex, and human-written programs.
翻译:由语义和结构信息组成的程序在人类和代理人之间的沟通中发挥着重要作用。为了学习一般程序执行者以统一观念、推理和决策,我们制定了程序指导任务,要求根据观察到的任务规格学习执行某个特定方案。此外,我们提议了程序指导变换器(ProTo),它通过利用交叉注意和掩盖自我意识,将一个方案的语义和结构指导结合起来,在程序规格和常规之间传递信息。促进在一个已学习的潜在空间执行一个方案,比以往的神经同步方法具有更强的代表性能力。我们证明,ProTo 大大超越了GQA视觉推理和2DMinecraft政策学习数据集方面以前的最先进的方法。此外,ProTo 演示了对不可见、复杂和人为程序更加概括化的概括化。