An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed, e.g. fully autonomous vehicles will encounter charged moral decisions that complicate their deployment. While ethical agents could be trained by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement about the nature of morality. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. This paper translates such insights to the field of reinforcement learning, proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty. The results illustrate (1) how such uncertainty can help curb extreme behavior from commitment to single theories and (2) several technical complications arising from attempting to ground moral philosophy in RL (e.g. how can a principled trade-off between two competing but incomparable reward functions be reached). The aim is to catalyze progress towards morally-competent agents and highlight the potential of RL to contribute towards the computational grounding of moral philosophy.
翻译:遵守人类道德规范的能力将大大扩大自主行为主体实际和安全部署的环境,例如完全自主的车辆将遇到令其部署复杂化的道德决定。虽然道德行为主体可以按照具体的道德理论(例如功利主义)通过奖励正确行为来接受培训,但在道德性质上仍然存在着广泛的分歧。承认这种分歧,最近道德哲学方面的工作表明,道德行为需要在道德不确定性下采取行动,即在采取行动时,考虑个人信誉跨越若干可信的道德理论时,这种观点将转化成强化学习领域,提出两种培训方法,在相互竞争的悬殊中实现不同点,在简单的环境中培训行为主体,在道德不确定的情况下采取行动。结果表明:(1)这种不确定性如何有助于抑制对单一理论的承诺中的极端行为,(2)在试图在卢伦德的道德哲学中(例如,如何在两种相互竞争但却不相容的道德奖赏功能之间实现原则性交换。)目的是催化在道德上具有竞争力的代理人和道德能力强势的代理人之间取得进步。