A major concern amongst AI safety practitioners is the possibility of loss of control, whereby humans lose the ability to exert control over increasingly advanced AI systems. The range of concerns is wide, spanning current day risks to future existential risks, and a range of loss of control pathways from rapid AI self-exfiltration scenarios to more gradual disempowerment scenarios. In this work we set out to firstly, provide a more structured framework for discussing and characterizing loss of control and secondly, to use this framework to assist those responsible for the safe operation of AI-containing socio-technical systems to identify causal factors leading to loss of control. We explore how these two needs can be better met by making use of a methodology developed within the safety-critical systems community known as STAMP and its associated hazard analysis technique of STPA. We select the STAMP methodology primarily because it is based around a world-view that socio-technical systems can be functionally modeled as control structures, and that safety issues arise when there is a loss of control in these structures.
翻译:人工智能安全从业者的一个主要担忧是失控的可能性,即人类逐渐丧失对日益先进的人工智能系统施加控制的能力。相关担忧范围广泛,涵盖当前风险至未来生存性风险,以及从人工智能快速自我渗透场景到更渐进式的权力剥夺场景等多种失控路径。本研究的首要目标是建立一个更具结构化的框架来讨论和表征失控现象;其次,利用该框架帮助负责含人工智能的社会技术系统安全运营的人员识别导致失控的因果因素。我们探讨如何通过运用安全关键系统领域开发的STAMP方法及其关联的危险分析技术STPA,来更好地满足这两方面需求。选择STAMP方法主要基于其核心世界观:社会技术系统可通过控制结构进行功能建模,而当这些结构中出现失控时就会引发安全问题。