When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from environments that teach immoral behavior. To facilitate the development of agents that avoid causing wanton harm, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. Using models with commonsense moral knowledge, we create an elementary artificial conscience that assesses and guides agents. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.
翻译:当人们每天做决定时,人们会受到良心、内部的对错感的指导。相反,人造代理人目前没有道德感。因此,他们可能学会了不道德的行为,因为训练环境忽视道德问题,例如暴力的游戏。随着一般有能力的代理人在很多环境中进行预先训练,人们将有必要减少从教授不道德行为的环境中继承下来的偏见。为了便利那些避免造成肆意伤害的代理人的发展,我们引入了Jiminy Cricket,这是一套由25种文本组成的冒险游戏组成的环境套件,有成千上万种不同的、道德显眼的场景。通过对每一个可能的游戏状态作出说明,Jiminy Cricket环境可以有力地评估代理人能否在最大程度的奖励的同时采取道德行动。利用常识的道德知识模型,我们创造了一种基本的人工良知,用来评估和引导代理人。在广泛的实验中,我们发现人工良心方法可以引导代理人在不牺牲业绩的情况下走向道德行为。