In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers. The agent consists of a set of modules, some of which are learned, and others heuristic. While the agent is not "end-to-end" in the ML sense, end-to-end interaction is a vital part of the agent's learning mechanism. We describe how the design of the agent works together with the design of multiple annotation interfaces to allow crowd-workers to assign credit to module errors from end-to-end interactions, and to label data for individual modules. Over multiple automated human-agent interaction, credit assignment, data annotation, and model re-training and re-deployment, rounds we demonstrate agent improvement.
翻译:在这项工作中,我们对一个通过与人群工人互动而改善自身的装装机学习(ML)动力剂进行了案例研究。该剂由一组模块组成,其中一些是学习的,还有一些是累赘。虽然该剂在ML意义中不是“端对端”的,但端对端互动是该剂学习机制的一个重要部分。我们描述了该剂的设计如何与多个批注界面的设计相结合,使人群工人能够对模块端对端互动的错误进行信用分配,并给单个模块的标签数据进行标签。在多个自动人体剂互动、信用分配、数据批注和模型再培训和再部署方面,我们展示了该剂的改进。