As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications, which poses a seemingly impossible challenge: teaching machines moral sense, while humanity continues to grapple with it. To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks trained directly to reason about descriptive ethical judgments, e.g., "helping a friend" is generally good, while "helping a friend spread fake news" is not. Empirical results shed novel insights on the promises and limits of machine ethics; Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment including unjust biases, confirming the need for explicitly teaching machines moral sense. Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite that, we demonstrate positive use cases of imperfect Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization of Delphi in light of prominent ethical theories, which leads us to important future research questions.
翻译:随着人工智能系统日益强大和普遍,人们越来越关注机器道德或缺乏道德。然而,向机器传授道德是一项艰巨的任务,因为道德仍然是人类最激烈争论的问题之一,更不用说AI了。 但是,向数百万用户部署的现有人工智能系统已经在做出具有道德影响的决策,这似乎是一个不可能的挑战:教授机器道德意识,而人类仍在努力应对这一挑战。为了探索这一挑战,我们引入了德尔菲,这是一个实验框架,它基于深厚的神经网络,直接受过训练,可以解释描述道德判断,例如“帮助朋友”一般是好的,而“帮助朋友散布假新闻”则不是。 经验性结果对机器道德的许诺和限制提出了新颖的洞见;德尔菲在新的道德形势下,展示了强大的一般化能力,而现成的神经网络模型则表现出明显的差强的判断,包括不公正的偏差,证实了明确教授机器道德意识的必要性。然而,德尔菲并不完美,表现出普遍的偏见和不一致。尽管如此,我们还是展示了不完善的德利菲化案例,包括利用它作为未来运行中重要的道德理论的突出的模型。