Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on human feedback, and those feedback usually need to be frequent and too complex for the humans to reliably provide. To avoid placing undue burden on human experts and allow quick adaptation in critical real-world situations, we propose designing and sparingly presenting easy-to-answer pairwise action preference queries in an online fashion. Our approach designs queries and determines when to present them to maximize the expected value derived from the queries' information. We demonstrate our approach with experiments in simulation, human user studies, and real robot experiments. In these settings, our approach outperforms baseline techniques while presenting fewer queries to human experts. Experiment videos, code and appendices are found at https://sites.google.com/view/onlineactivepreferences.
翻译:机器人政策需要适应人类的偏好和/或新的环境。人类专家可能拥有帮助机器人实现这一适应所需的域知识。然而,现有的工程往往需要花费昂贵的离线再培训,进行人类反馈,这些反馈通常需要频繁和过于复杂,人类无法可靠地提供。为了避免给人类专家造成不必要的负担,允许在关键现实世界环境中快速适应,我们提议设计和保留在网上以在线方式提出容易回答的对称行动偏爱查询。我们的方法设计查询并确定何时提出这些查询以尽量扩大查询信息产生的预期值。我们展示了模拟、人类用户研究和真正的机器人实验的方法。在这些环境中,我们的方法优于基线技术,同时向人类专家提出较少的查询。实验性视频、代码和附录见https://sites.gogle.com/view/onlineactimactimeferations。</s>