对错误描述的人类模型的回报性引文的敏感性 (On the Sensitivity of Reward Inference to Misspecified Human Models)

Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so relies on models of how humans behave given their objectives. After decades of research in cognitive science, neuroscience, and behavioral economics, obtaining accurate human models remains an open research topic. This begs the question: how accurate do these models need to be in order for the reward inference to be accurate? On the one hand, if small errors in the model can lead to catastrophic error in inference, the entire framework of reward learning seems ill-fated, as we will never have perfect models of human behavior. On the other hand, if as our models improve, we can have a guarantee that reward accuracy also improves, this would show the benefit of more work on the modeling side. We study this question both theoretically and empirically. We do show that it is unfortunately possible to construct small adversarial biases in behavior that lead to arbitrarily large errors in the inferred reward. However, and arguably more importantly, we are also able to identify reasonable assumptions under which the reward inference error can be bounded linearly in the error in the human model. Finally, we verify our theoretical insights in discrete and continuous control tasks with simulated and human data.

翻译：从人类行为中推论奖励功能是价值调整的核心 — — 将AI目标与我们人类实际想要的目标相匹配。但是,这样做取决于人类行为模式的模型。经过数十年的认知科学、神经科学和行为经济学研究后,准确的人类模型仍是一个开放的研究主题。这就提出了这样一个问题:这些模型需要如何准确才能使奖励推论准确?一方面,如果模型中的小错误可能导致灾难性的推论错误,那么整个奖赏学习框架似乎不成熟,因为我们永远没有完美的人类行为模型。另一方面,如果我们的模型改进,我们可以保证奖励准确性也会提高,这将显示在模型方面开展更多工作的好处。我们从理论上和实验上研究这一问题。我们确实表明,不幸的是,在导致推断奖赏中任意出现大错的行为中,有可能形成小的对抗偏差。然而,更重要的是,我们可以找到合理的假设,根据这些推论错误,我们永远不会有完美的人类行为模型,我们最终能够通过直线式的模拟来修正人类数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日