系统异常探测有条件随机场的等级式方法 (A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection)

Anomaly detection to recognize unusual events in large scale systems in a time sensitive manner is critical in many industries, eg. bank fraud, enterprise systems, medical alerts, etc. Large-scale systems often grow in size and complexity over time, and anomaly detection algorithms need to adapt to changing structures. A hierarchical approach takes advantage of the implicit relationships in complex systems and localized context. The features in complex systems may vary drastically in data distribution, capturing different aspects from multiple data sources, and when put together provide a more complete view of the system. In this paper, two datasets are considered, the 1st comprising of system metrics from machines running on a cloud service, and the 2nd of application metrics from a large-scale distributed software system with inherent hierarchies and interconnections amongst its system nodes. Comparing algorithms, across the changepoint based PELT algorithm, cognitive learning-based Hierarchical Temporal Memory algorithms, Support Vector Machines and Conditional Random Fields provides a basis for proposing a Hierarchical Global-Local Conditional Random Field approach to accurately capture anomalies in complex systems across various features. Hierarchical algorithms can learn both the intricacies of specific features, and utilize these in a global abstracted representation to detect anomalous patterns robustly across multi-source feature data and distributed systems. A graphical network analysis on complex systems can further fine-tune datasets to mine relationships based on available features, which can benefit hierarchical models. Furthermore, hierarchical solutions can adapt well to changes at a localized level, learning on new data and changing environments when parts of a system are over-hauled, and translate these learnings to a global view of the system over time.

翻译：在许多行业,如银行欺诈、企业系统、医疗警报等行业,以时敏感的方式对大规模系统中的异常事件进行反常检测,这是十分关键的。大型系统的规模和复杂性往往随着时间推移而增加,异常检测算法需要适应不断变化的结构。等级办法利用了复杂系统和本地环境的隐含关系。复杂系统中的特征在数据分配方面可能大不相同,从多个数据源中捕捉到不同方面,当组合在一起提供系统更完整的系统视图时。本文考虑了两套数据集,一是系统等级指标,一是来自运行于云服务的机器的系统内部等级指标,二是来自大规模分布式软件系统的规模和复杂性的系统应用指标,其内在的等级和系统节点之间需要适应不断变化的结构。比较算算法,跨基于变化点的PELT算法,认知学习基于高等级的内位性内位性内位内位内位内存储内位内位内存储内位内位内位内位内存储内位内位内位内系、支持矢内位内位内位内位内系可提出更深入全球内部更甚的内位内位内位内位内位内位内建解决方案。系统可提供基础, 更深入全球级内位内位内位内位内建的系统可更更深的系统可更替的系统可推。系统可推系统可更深的系统可推系统,在复杂的系统可推至复杂的系统可推至复杂的系统,在复杂的系统,在复杂的系统可推至复杂的系统,在复杂的系统,在复杂的系统,在复杂的系统上,在复杂的系统,在复杂的系统,在复杂的系统上可更精确到复杂的系统内部推至复杂的系统,在各种外推至复杂的系统上,在复杂的系统,在一系列内测系内测系系系系內系系系系系系系系内,在各种,在各种系系系内演、全球内演至各种系系系系系系系系内演得、全球间可判。