Parameter tuning for robotic systems is a time-consuming and challenging task that often relies on domain expertise of the human operator. Moreover, existing learning methods are not well suited for parameter tuning for many reasons including: the absence of a clear numerical metric for `good robotic behavior'; limited data due to the reliance on real-world experimental data; and the large search space of parameter combinations. In this work, we present an open-source MATLAB Preference Optimization and Learning Algorithms for Robotics toolbox (POLAR) for systematically exploring high-dimensional parameter spaces using human-in-the-loop preference-based learning. This aim of this toolbox is to systematically and efficiently accomplish one of two objectives: 1) to optimize robotic behaviors for human operator preference; 2) to learn the operator's underlying preference landscape to better understand the relationship between adjustable parameters and operator preference. The POLAR toolbox achieves these objectives using only subjective feedback mechanisms (pairwise preferences, coactive feedback, and ordinal labels) to infer a Bayesian posterior over the underlying reward function dictating the user's preferences. We demonstrate the performance of the toolbox in simulation and present various applications of human-in-the-loop preference-based learning.
翻译:对机器人系统进行参数调整是一项耗时且具有挑战性的任务,往往依赖人类操作者的域域专长。此外,现有的学习方法并不完全适合参数调整,原因很多,包括:缺乏关于“良好的机器人行为”的明确数字指标;由于依赖现实世界实验数据,数据有限;以及参数组合的搜索空间很大。在这项工作中,我们提出了一个开放源MATLAB优化和学习高分数工具箱(POLAR),用于系统地探索高维参数空间,使用以人为流动优惠为基础的学习。这个工具箱的目的是系统而有效地实现以下两个目标之一:1)优化人类操作者偏好的机器人行为;2)学习操作者的基本偏好环境,以更好地了解可调整参数和操作者偏好之间的关系。POLAR工具箱实现这些目标时,仅使用主观反馈机制(偏好、协作反馈和或定义标签)来将巴伊西亚后方的后方位空间置于当前学习工具的模拟工具偏好度上。