In many situations it is either impossible or impractical to develop and evaluate agents entirely on the target domain on which they will be deployed. This is particularly true in robotics, where doing experiments on hardware is much more arduous than in simulation. This has become arguably more so in the case of learning-based agents. To this end, considerable recent effort has been devoted to developing increasingly realistic and higher fidelity simulators. However, we lack any principled way to evaluate how good a "proxy domain" is, specifically in terms of how useful it is in helping us achieve our end objective of building an agent that performs well in the target domain. In this work, we investigate methods to address this need. We begin by clearly separating two uses of proxy domains that are often conflated: 1) their ability to be a faithful predictor of agent performance and 2) their ability to be a useful tool for learning. In this paper, we attempt to clarify the role of proxy domains and establish new proxy usefulness (PU) metrics to compare the usefulness of different proxy domains. We propose the relative predictive PU to assess the predictive ability of a proxy domain and the learning PU to quantify the usefulness of a proxy as a tool to generate learning data. Furthermore, we argue that the value of a proxy is conditioned on the task that it is being used to help solve. We demonstrate how these new metrics can be used to optimize parameters of the proxy domain for which obtaining ground truth via system identification is not trivial.
翻译:在很多情况下,完全开发并评价其部署目标领域的代理商是不可能或不切实际的。在机器人方面尤其如此,在机器人方面,对硬件进行实验比模拟要困难得多。在学习型代理商方面,这可以说更为困难。为此目的,最近作出了相当大的努力,开发越来越现实和更加忠实的模拟器。然而,我们缺乏任何原则性的方法来评价“代理商领域”的好坏,特别是它帮助我们实现最终目标,即建立一个在目标领域表现良好的代理商。在这项工作中,我们调查解决这一需要的方法。我们首先明确区分代用域的两种用途,这些用途往往是混在一起的:1)它们能够忠实地预测代理人业绩,2)它们能够成为有用的学习工具。在这份文件中,我们试图澄清代用域的作用,并建立新的代用(PU)衡量尺度,以比较不同代用域的效用。我们建议相对预测性PU,用来评估一个代用域的预测能力,而通过代用代用代用代用域的学习工具,我们如何用这个工具来量化其价值。我们用新的代用代用工具来证明它的价值。我们如何用这个工具来证明它。