Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network performance is crucial to ensure reliable and efficient workflow execution. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28\%, 29\%, and 40\% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection.
翻译:现代科学工作流程是数据驱动的,往往在分布式、多样化和高性能的计算基础设施上执行。工作流程执行中的异常和失败导致科学生产力丧失,基础设施使用效率低下。因此,检测、诊断和减轻这些异常现象对于可靠和有性能的科学工作流程极为重要。由于这些工作流程严重依赖高性能网络传输,需要严格的QOS限制,准确检测异常网络性能对于确保可靠和高效的工作流程执行至关重要。为了应对这一挑战,我们开发了X-FLASASH,这是一个网络异常检测工具,用于错误的 TCP工作流程传输。X-FRAASH采用了新的超参数调整和数据挖掘方法,用于改进机器学习算法的性能,以准确分类异常 TCP包。X-FLASH 将XGBOost作为混合模型,而夫妇XGBOust则以序列优化器FGBOS,从搜索软件工程中借用来学习最佳模型参数。 X-FLASASH发现,从现有方法到28°, 29°Z, 和40°A 相对的升级,从常规性研究领域,从更小于常规研究,从常规,从更小于常规,从常规,从常规, 和核心,从更小的,从常规,从常规,到更小于常规,从常规,从更小的,从常规,到未来,从常规,从更小的,从新的研究,从常规,从常规,从新的,从新的研究,到更小的,从新的,从更小,到更。