Large Language Models (LLMs), such as ChatGPT and BERT, are leading a new AI heatwave due to its human-like conversations with detailed and articulate answers across many domains of knowledge. While LLMs are being quickly applied to many AI application domains, we are interested in the following question: Can safety analysis for safety-critical systems make use of LLMs? To answer, we conduct a case study of Systems Theoretic Process Analysis (STPA) on Automatic Emergency Brake (AEB) systems using ChatGPT. STPA, one of the most prevalent techniques for hazard analysis, is known to have limitations such as high complexity and subjectivity, which this paper aims to explore the use of ChatGPT to address. Specifically, three ways of incorporating ChatGPT into STPA are investigated by considering its interaction with human experts: one-off simplex interaction, recurring simplex interaction, and recurring duplex interaction. Comparative results reveal that: (i) using ChatGPT without human experts' intervention can be inadequate due to reliability and accuracy issues of LLMs; (ii) more interactions between ChatGPT and human experts may yield better results; and (iii) using ChatGPT in STPA with extra care can outperform human safety experts alone, as demonstrated by reusing an existing comparison method with baselines. In addition to making the first attempt to apply LLMs in safety analysis, this paper also identifies key challenges (e.g., trustworthiness concern of LLMs, the need of standardisation) for future research in this direction.
翻译:大语言模型(LLMs),例如ChatGPT和BERT,由于其在许多知识领域中进行详细且清晰的人类对话回答而引领了新的AI热潮。虽然LLMs正在快速应用于许多AI应用领域,但我们对以下问题感兴趣:对于安全关键系统的安全分析是否可以利用LLMs?为了回答这个问题,本文使用ChatGPT对自动紧急制动(AEB)系统进行系统理论过程分析(STPA)进行案例研究。STPA是最常用的危险分析技术之一,已知存在高复杂性和主观性的限制,本文旨在探讨使用ChatGPT来解决这些问题。具体而言,通过考虑ChatGPT与人类专家的交互,研究了三种将ChatGPT整合到STPA中的方法:一次单工交互、重复单工交互和重复双工交互。比较结果表明:(i)仅使用ChatGPT而不涉及人类专家的干预可能不足,因为LLMs的可靠性和准确性问题;(ii)ChatGPT和人类专家之间的更多交互可能会产生更好的结果; (iii)在对ChatGPT的使用上需要更加谨慎,可以超越单独的人类安全专家,如再利用现有比较方法和基线所示。除了首次尝试将LLMs应用于安全分析之外,本文还确定了未来研究在这个方向上面临的关键挑战(例如LLMs的可信度问题,标准化需要等)。