Robots of the future are going to exhibit increasingly human-like and super-human intelligence in a myriad of different tasks. They are also likely going to fail and be incompliant with human preferences in increasingly subtle ways. Towards the goal of achieving autonomous robots, the robot learning community has made rapid strides in applying machine learning techniques to train robots through data and interaction. This makes the study of how best to audit these algorithms for checking their compatibility with humans, pertinent and urgent. In this paper, we draw inspiration from the AI Safety and Alignment communities and make the case that we need to urgently consider ways in which we can best audit our robot learning algorithms to check for failure modes, and ensure that when operating autonomously, they are indeed behaving in ways that the human algorithm designers intend them to. We believe that this is a challenging problem that will require efforts from the entire robot learning community, and do not attempt to provide a concrete framework for auditing. Instead, we outline high-level guidance and a possible approach towards formulating this framework which we hope will serve as a useful starting point for thinking about auditing in the context of robot learning.
翻译:将来的机器人将会在一系列不同的任务中展示日益人性化和超人性智慧。 它们也有可能失败,以越来越微妙的方式满足人类的偏好。 为了实现自主机器人的目标,机器人学习社区在应用机器学习技术通过数据和互动培训机器人方面迅速取得了进步。 这使得研究如何最好地审计这些算法,以检查其与人类的兼容性、相关性和紧迫性。 在本文中,我们从 AI 安全和协调社区中汲取灵感,并证明我们需要紧急考虑如何对机器人学习算法进行最佳审计,以检查失败模式,并确保在自主操作时,这些算法确实以人类算法设计者想要的方式运作。 我们认为,这是一个具有挑战性的问题,需要整个机器人学习社区作出努力,而不是试图为审计提供一个具体框架。 相反,我们概述了高层次的指导以及制定这一框架的可能方法,我们希望这将作为考虑机器人学习背景下审计的有益起点。