Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed framework where a human is observed making non-sequential binary decisions under uncertainty. In IDT, the human's preferences are conveyed through their loss function, which expresses a tradeoff between different types of mistakes. We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision. Interestingly, we show that it is actually easier to identify preferences when the decision problem is more uncertain. Furthermore, uncertain decision problems allow us to relax the unrealistic assumption that the human is an optimal decision maker but still identify their exact preferences; we give sample complexities in this suboptimal case as well. Our analysis contradicts the intuition that partial observability should make preference learning more difficult. It also provides a first step towards understanding and improving preference learning methods for uncertain and suboptimal humans.
翻译:学习人类偏好的现有观察方法,例如反强化学习,通常对人类环境的可视性作出强有力的假设。然而,在现实中,人们在不确定的情况下作出许多重要决定。为了更好地理解在这些情况下的偏好学习,我们研究了逆向决定理论(IDT)的设置,这是以前提出的一个框架,观察到一个人在不确定的情况下作出非顺序的二进制决定。在IDT中,人类的偏好是通过其损失功能传达的,这代表了不同类型错误之间的权衡。我们对IDT的第一次统计分析,提供了确定这些偏好和确定抽样复杂性所必需的条件 -- -- 即为了解人类正在作的权衡所期望的精确度而必须遵守的决定数目。有趣的是,我们表明,当决定问题更加不确定时,确定偏好实际上是比较容易的。此外,不确定的决定问题使我们能够放松不切实际的假设,即人类是最佳的决策者,但是仍然查明其确切的偏好;我们在这一次优劣的案例中也给出了抽样的复杂性。我们的分析与这种直觉相矛盾的是,即部分的耐性应该使偏向性使偏爱更难于了解最不确定的方法。它也提供了第一步。