We study continual learning for natural language instruction generation, by observing human users' instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system's success communicating its intent. We show how to use this signal to improve the system's ability to generate instructions via contextual bandit learning. In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time.
翻译:我们通过观察人类用户的教学执行情况,不断研究自然语言教学的生成。我们注重协作方案,即系统既使用自然语言,又将任务委托给人类用户。我们将用户执行生成的指示与原始系统意图进行比较,以表明系统成功传达其意图。我们展示如何利用这一信号提高系统通过背景土匪学习生成指示的能力。在与实际用户的互动中,我们的系统显示其随着时间的推移生成语言的能力有了显著提高。