Given two sources of evidence about a latent variable, one can combine the information from both by multiplying the likelihoods of each piece of evidence. However, when one or both of the observation models are misspecified, the distributions will conflict. We study this problem in the setting with two conflicting reward functions learned from different sources. In such a setting, we would like to retreat to a broader distribution over reward functions, in order to mitigate the effects of misspecification. We assume that an agent will maximize expected reward given this distribution over reward functions, and identify four desiderata for this setting. We propose a novel algorithm, Multitask Inverse Reward Design (MIRD), and compare it to a range of simple baselines. While all methods must trade off between conservatism and informativeness, through a combination of theory and empirical results on a toy environment, we find that MIRD and its variant MIRD-IF strike a good balance between the two.
翻译:鉴于潜伏变量的两种证据来源,人们可以通过乘以每件证据的可能性,将两者的信息结合起来。然而,当一个或两个观察模型被错误地指定时,分布就会发生冲突。我们用从不同来源学到的两种相互矛盾的奖赏功能来研究这个问题。在这种环境下,我们想在奖励功能上进行更广泛的分配,以减轻错误区分的影响。我们假设代理人将因奖赏功能的分配而获得最大的预期奖赏,并为这一环境确定四大分界。我们提出了一种新颖的算法,即多塔斯克反反向反向反向设计(MIRD),并将它与一系列简单的基线进行比较。虽然所有方法都必须通过理论和对一个温床环境的经验结果相结合,在保守主义与信息性之间进行交易,但我们发现MIRD及其变式MIRD-IF在两者之间取得了良好的平衡。