Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow the command passively. We present DialFRED, a dialogue-enabled embodied instruction following benchmark based on the ALFRED benchmark. DialFRED allows an agent to actively ask questions to the human user; the additional information in the user's response is used by the agent to better complete its task. We release a human-annotated dataset with 53K task-relevant questions and answers and an oracle to answer questions. To solve DialFRED, we propose a questioner-performer framework wherein the questioner is pre-trained with the human-annotated data and fine-tuned with reinforcement learning. We make DialFRED publicly available and encourage researchers to propose and evaluate their solutions to building dialog-enabled embodied agents.
翻译:语言引导的 Embodi AI 基准要求代理人在环境中航行和操控对象,这些基准通常允许单向通信: 人类用户给代理人自然语言指令, 代理人只能被动地遵循指令。 我们提出DialFRED, 这是一种基于ALFRED基准的以对话为基础的体现性指令。 DialFRED 允许代理人向人类用户积极提问; 用户反应中的额外信息被代理人用来更好地完成任务。 我们发布了一个带有53K任务相关问答的附加说明的数据集, 以及一个回答问题的甲骨文。 为了解决 DialFRED, 我们提出了一个提问者前框架, 询问者在其中接受人附加说明性数据的培训, 并经过强化学习的微调。 我们公开DialFRED, 并鼓励研究人员提出并评价其解决方案, 以建立有对话框的装饰剂。