以下是我所学到的:问问题奖励学习 (Here's What I've Learned: Asking Questions that Reveal Reward Learning)

Robots can learn from humans by asking questions. In these questions the robot demonstrates a few different behaviors and asks the human for their favorite. But how should robots choose which questions to ask? Today's robots optimize for informative questions that actively probe the human's preferences as efficiently as possible. But while informative questions make sense from the robot's perspective, human onlookers often find them arbitrary and misleading. In this paper we formalize active preference-based learning from the human's perspective. We hypothesize that -- from the human's point-of-view -- the robot's questions reveal what the robot has and has not learned. Our insight enables robots to use questions to make their learning process transparent to the human operator. We develop and test a model that robots can leverage to relate the questions they ask to the information these questions reveal. We then introduce a trade-off between informative and revealing questions that considers both human and robot perspectives: a robot that optimizes for this trade-off actively gathers information from the human while simultaneously keeping the human up to date with what it has learned. We evaluate our approach across simulations, online surveys, and in-person user studies. Videos of our user studies and results are available here: https://youtu.be/tC6y_jHN7Vw.

翻译：机器人可以通过询问问题向人类学习。在这些问题中,机器人展示了几种不同的行为, 并询问人类的最爱。但是机器人应该如何选择要问的问题? 今天的机器人将优化为信息性的问题, 以便尽可能有效地积极探究人类的偏好。但是,虽然从机器人的角度来看, 信息性的问题是有道理的, 人类的旁观者通常会发现它们具有任意性和误导性。在这份文件中, 我们正式确定从人类的角度进行积极的优惠性学习。我们假设机器人的问题 -- -- 从人类的观点看 -- -- 能够揭示机器人已经和没有学到的东西。我们的洞察力使机器人能够使用问题来使其学习过程透明化给人类操作者。我们开发并测试一个模型, 机器人可以将他们问的问题与这些问题所揭示的信息联系起来。我们然后在信息性和披露性的问题之间引入一种交换和交换性, 既考虑人类和机器人的观点是最佳的机器人: 一个从人类那里积极收集信息的机器人, 同时保持人类所学到的东西。我们在这里评估我们的方法, 在线调查, 和用户研究。