Many industrial sectors have been collecting big sensor data. With recent technologies for processing big data, companies can exploit this for automatic failure detection and prevention. We propose the first completely automated method for failure analysis, machine-learning fault trees from raw observational data with continuous variables. Our method scales well and is tested on a real-world, five-year dataset of domestic heater operations in The Netherlands, with 31 million unique heater-day readings, each containing 27 sensor and 11 failure variables. Our method builds on two previous procedures: the C4.5 decision-tree learning algorithm, and the LIFT fault tree learning algorithm from Boolean data. C4.5 pre-processes each continuous variable: it learns an optimal numerical threshold which distinguishes between faulty and normal operation of the top-level system. These thresholds discretise the variables, thus allowing LIFT to learn fault trees which model the root failure mechanisms of the system and are explainable. We obtain fault trees for the 11 failure variables, and evaluate them in two ways: quantitatively, with a significance score, and qualitatively, with domain specialists. Some of the fault trees learnt have almost maximum significance (above 0.95), while others have medium-to-low significance (around 0.30), reflecting the difficulty of learning from big, noisy, real-world sensor data. The domain specialists confirm that the fault trees model meaningful relationships among the variables.
翻译:许多工业部门一直在收集大型传感器数据。由于最近使用处理大型数据的技术,公司可以利用这一方法进行自动故障检测和预防。我们建议了第一个完全自动化的故障分析方法,即从原始观测数据中用连续变量进行机器学习断层树和原始观测数据。我们的方法规模很好,并在荷兰的家庭暖气操作实际世界的五年数据集中测试,每个系统有3 100万个独特的热天读数,每个系统有27个传感器和11个故障变量。我们的方法以前两个程序为基础:C4.5决定树学习算法和LIFT断层树从Boolean数据中学习算法。C4.5预处理每个连续变量:它学习一个最佳的数字阈值,分辨了顶层系统的错误和正常操作。这些阈值将变量分解,从而使LIFT能够学习错误树,这些树是系统根机机制的模型,并且可以解释。我们从11个故障变量中获取错误树,并以两种方式加以评估:量化,具有重大分分,用域专家进行质量评估。一些错误树所学的模型几乎具有最大意义(超过0.95/9.5,而其他的深度专家则学习了真实的甚高空关系。