Most existing failure detection algorithms rely on statistical methods, and very few use machine learning (ML). This paper explores the viability of ML in the field of failure detection: is it possible to implement an ML-based detector that achieves a satisfactory quality of service? We implement a prototype that uses a basic long short-term memory neural network algorithm, and study its behavior with real traces. Although ML model has comparatively longer computing time, our prototype performs well in terms of accuracy and detection time.
翻译:大多数现有的故障检测算法都依赖统计方法,很少人使用机器学习(ML ) 。 本文探讨了ML在故障检测领域的可行性:能否实施一个基于ML的检测器,以达到令人满意的服务质量? 我们实施一个模型,使用基本的短期内存神经网络短期算法,并用真实的痕迹来研究其行为。 尽管ML模型的计算时间相对较长,但我们的原型在准确性和检测时间方面表现良好。