Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target wide-area networks that support data-intensive distributed applications. We first present a new multi-output prediction model that directly maps the application level observations to localize the system component failures. In reality, this application-centric approach may face the missing data challenge as some input (feature) data to the inference models may be missing due to incomplete or lost measurements in wide area networks. We show that the presented prediction model naturally allows the {\it multivariate} imputation to recover the missing data. We evaluate multiple imputation algorithms and show that the prediction performance can be improved significantly in a large-scale network. As far as we know, this is the first study on the missing data issue and applying imputation techniques in network failure localization.
翻译:最近的网络故障诊断系统侧重于数据中心网络,在这些网络中,可以部署复杂的测量系统,以获取路径信息并确保网络覆盖,从而实现准确和快速的误差定位。在本文中,我们针对支持数据密集分布应用的广域网络。我们首先提出一个新的多产出预测模型,直接绘制应用水平观测图,将系统部件故障本地化。在现实中,这种以应用为中心的方法可能面临数据缺失的挑战,因为由于广域网络中的一些输入(性能)数据可能缺失或丢失了测量数据,因此对推理模型可能缺少一些输入(性能)数据。我们显示,所展示的预测模型自然允许 it 多重变量估算来恢复缺失的数据。我们评估了多重估算算法,并表明在大型网络中预测性能可以大大改进。据我们所知,这是关于缺失数据问题和在网络故障本地化中应用光学技术的首项研究。