Due to various sources of uncertainty, emergent behavior, and ongoing changes, the reliability of many socio-technical systems depends on an iterative and collaborative process in which organizations (1) analyze and learn from system failures, and then (2) co-evolve both the technical and human parts of their systems based on what they learn. Many organizations have defined processes for learning from failure, often involving postmortem analyses conducted after any system failures that are judged to be sufficiently severe. Despite established processes and tool support, our preliminary research, and professional experience, suggest that it is not straightforward to take what was learned from a failure and successfully improve the reliability of the socio-technical system. To better understand this collaborative process and the associated challenges, we are conducting a study of how teams learn from failure. We are gathering incident reports from multiple organizations and conducting interviews with engineers and managers with relevant experience. Our analytic interest is in what is learned by teams as they reflect on failures, the learning processes involved, and how they use what is learned. Our data collection and analysis are not yet complete, but we have so far analyzed 13 incident reports and seven interviews. In this short paper we (1) present our preliminary findings, and (2) outline our broader research plans.
翻译:暂无翻译