Value alignment has emerged in recent years as a basic principle to produce beneficial and mindful Artificial Intelligence systems. It mainly states that autonomous entities should behave in a way that is aligned with our human values. In this work, we summarize a previously developed model that considers values as preferences over states of the world and defines alignment between the governing norms and the values. We provide a use-case for this framework with the Iterated Prisoner's Dilemma model, which we use to exemplify the definitions we review. We take advantage of this use-case to introduce new concepts to be integrated with the established framework: alignment equilibrium and Pareto optimal alignment. These are inspired on the classical Nash equilibrium and Pareto optimality, but are designed to account for any value we wish to model in the system.
翻译:近年来,价值调整已成为产生有益和谨慎的人工智能系统的一项基本原则,主要指出自治实体的行为方式应当与人类价值观相一致。在这项工作中,我们总结了以前开发的模型,该模型将价值观视为对世界各国的偏好,并界定了治理规范和价值观之间的一致。我们用“循环囚犯”的Diilemma模型为这一框架提供了一个使用案例,我们用该模型来作为我们所审查定义的范例。我们利用这一使用案例引入了与既定框架相结合的新概念:调整平衡和Pareto最佳匹配。这些模型的灵感来自古典的纳什平衡和Pareto最佳性,但旨在说明我们希望在系统中建模的任何价值。