With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine three of them. The first issue is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We make this technical topic more accessible by using the framing of cognitive architectures. The second issue is to move beyond an analytical framing that treats useful intelligence as being reward maximization only. To support this move, we define several AI cognitive architectures that combine reward maximization with other technical elements designed to improve alignment. The third issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment.
翻译:随着AI系统变得更加强大和普遍,关于使其行动与更广泛的人类目标和需要保持一致的辩论日益激烈。这种多学科和多方利益攸关方的辩论必须解决许多问题,我们在这里研究其中的三个问题。第一个问题是澄清利益攸关方对AI系统设计者可能作出哪些有益的贡献,因为技术的存在,可以实施这些系统。我们通过使用认知架构使这个技术专题更易于理解。第二个问题是超越分析框架,将有用的情报视为仅仅是奖励最大化。为了支持这一举措,我们定义了一些AI认知结构,将奖励最大化与其他技术要素结合起来,以改善一致性。第三个问题是利益攸关方如何调整与现代机器学习研究人员的互动。我们考虑机器学习的当前时尚如何创造出一个说明性拉动,让参加技术和政策讨论的人意识到这一点,以便他们能够对此进行补偿。我们确定了几个技术上可移植但目前无法完成的改进AI一致性的选项。