Handling various robot action-language translation tasks flexibly is an essential requirement for natural interaction between a robot and a human. Previous approaches require change in the configuration of the model architecture per task during inference, which undermines the premise of multi-task learning. In this work, we propose the paired gated autoencoders (PGAE) for flexible translation between robot actions and language descriptions in a tabletop object manipulation scenario. We train our model in an end-to-end fashion by pairing each action with appropriate descriptions that contain a signal informing about the translation direction. During inference, our model can flexibly translate from action to language and vice versa according to the given language signal. Moreover, with the option to use a pretrained language model as the language encoder, our model has the potential to recognise unseen natural language input. Another capability of our model is that it can recognise and imitate actions of another agent by utilising robot demonstrations. The experiment results highlight the flexible bidirectional translation capabilities of our approach alongside with the ability to generalise to the actions of the opposite-sitting agent.
翻译:灵活处理各种机器人行动语言翻译任务是机器人和人类之间自然互动的基本要求。 先前的方法要求在推论期间改变每个任务模型结构的配置, 这会破坏多任务学习的前提。 在此工作中, 我们提议配对的门式自动编码器( PGAE ) 用于机器人动作和桌面天体操纵情景中语言描述之间的灵活翻译。 我们以端对端的方式培训我们的模型, 将每个动作配对成包含关于翻译方向的信号的适当描述。 在推论期间, 我们的模型可以根据给定语言信号灵活地从动作转换为语言, 反之则可以灵活地转换为语言。 此外, 如果选择使用预先训练的语言模型作为语言编码器, 我们的模型就有可能识别无形的自然语言输入。 我们模型的另一个能力是, 它能够通过使用机器人演示来识别和模仿另一个代理器的行为。 实验结果突出了我们方法的灵活的双向翻译能力, 同时能够对相反的代理器的行为进行概括。