There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and largest Wi-Fi trace known to date.
翻译:越来越多的人开始关注开发新的数据驱动模型,以评估通信网络的性能。然而,如果数据模型无法被人类操作员解释,那么其便失去了其意义。在本文中,我们提出了多元大数据分析(MBDA)方法的一个扩展。该方法是一种近期提出的可解释数据分析工具。在这个扩展中,我们提出了特征自动提取的解决方案,这是处理海量数据时需要核心的一个步骤。得出的网络监控方法可以检测和诊断不同的网络异常,数据分析工作流程结合可解释和交互式的模型优点和并行处理的优势。我们将扩展的MBDA应用于两个案例研究:UGR’16和Dartmouth’18。UGR’16是一个基于真实流量的异常检测基准数据集,Dartmouth’18是迄今为止最长,最大的Wi-Fi跟踪数据。