We present what we call the Interpretation Problem, whereby any rule in symbolic form is open to infinite interpretation in ways that we might disapprove of and argue that any attempt to build morality into machines is subject to it. We show how the Interpretation Problem in Artificial Intelligence is an illustration of Wittgenstein's general claim that no rule can contain the criteria for its own application, and that the risks created by this problem escalate in proportion to the degree to which to machine is causally connected to the world, in what we call the Law of Interpretative Exposure. Using game theory, we attempt to define the structure of normative spaces and argue that any rule-following within a normative space is guided by values that are external to that space and which cannot themselves be represented as rules. In light of this, we categorise the types of mistakes an artificial moral agent could make into Mistakes of Intention and Instrumental Mistakes, and we propose ways of building morality into machines by getting them to interpret the rules we give in accordance with these external values, through explicit moral reasoning, the Show, not Tell paradigm, the adjustment of causal power and structure of the agent, and relational values, with the ultimate aim that the machine develop a virtuous character and that the impact of the Interpretation Problem is minimised.
翻译:我们提出了解释问题,我们称之为解释问题,通过这种解释,任何象征性形式的规则都可以以我们可能不同意的方式无限地解释。我们试图界定规范空间的结构,并争辩说,任何将道德纳入机器的尝试都要服从于它。我们展示了人工智能中的解释问题是如何说明人工智能中的价值观,这说明了Wittgenstein的一般主张,即任何规则都无法包含其应用标准,而这一问题造成的风险与机器与世界的因果关系程度成比例,我们称之为解释性暴露法。我们试图利用游戏理论来界定规范空间的结构,并争论规范空间内任何规则的实施都以该空间外部的价值观为指导,而这种价值观本身不能作为规则。 有鉴于此,我们把人为道德代理人可能误入意图和工具误差的错误类型归类为错误,我们提出将道德纳入机器的方法,通过明确的道德推理、演化、非叙述性范式、因果力量的调整以及最终动力和机能作用的诠释,以及最终的机能和机能、最终的机能和机能的机能影响和机能、最终的机能和机能的机能、机能的机能和机能的机能的机能关系发展。