根据实时反馈不断学习教学 (Continual Learning for Instruction Following from Realtime Feedback)

We study the problem of continually training an instruction-following agent through feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent's instruction execution. We cast learning as a contextual bandit problem, converting the user feedback to immediate reward. We evaluate through multiple rounds of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

翻译：我们研究通过用户在协作互动期间提供的反馈不断培训一个遵循指导的代理人的问题。在互动中,人类用户用自然语言指导一个代理人,并在他们观察该代理人的指令执行时提供实时的二进制反馈。我们把学习作为一个背景强盗问题,将用户反馈转化为立即奖励。我们通过多轮人体剂互动来评估,表明在一段时间内,教学执行方面有15.4%的绝对改进。我们还表明我们的方法对一些设计变异非常有力,反馈信号大致相当于监督演示数据的学习信号。

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日