Crash localization, an important step in debugging crashes, is challenging when dealing with an extremely large number of diverse applications and platforms and underlying root causes. Large-scale error reporting systems, e.g., Windows Error Reporting (WER), commonly rely on manually developed rules and heuristics to localize blamed frames causing the crashes. As new applications and features are routinely introduced and existing applications are run under new environments, developing new rules and maintaining existing ones become extremely challenging. We propose a data-driven solution to address the problem. We start with the first large-scale empirical study of 362K crashes and their blamed methods reported to WER by tens of thousands of applications running in the field. The analysis provides valuable insights on where and how the crashes happen and what methods to blame for the crashes. These insights enable us to develop DeepAnalyze, a novel multi-task sequence labeling approach for identifying blamed frames in stack traces. We evaluate our model with over a million real-world crashes from four popular Microsoft applications and show that DeepAnalyze, trained with crashes from one set of applications, not only accurately localizes crashes of the same applications, but also bootstraps crash localization for other applications with zero to very little additional training data.
翻译:崩溃本地化是拆解崩溃的一个重要步骤,在应对极其众多的各种应用和平台及根本原因时具有挑战性。大型错误报告系统,例如Windows Woork Report (WER),通常依靠人工开发的规则和超自然法则来将引发碰撞的指责框架本地化。随着新应用和特性的例行引入,以及现有应用在新的环境中运行,制定新规则和维护现有应用都变得极具挑战性。我们提出了一个数据驱动的解决方案来解决这个问题。我们首先从首次大规模的经验研究开始,对362K撞车及其被指责的方法进行了大规模的经验研究,数以万计的应用程序向WER报告。分析提供了宝贵的洞察力,说明撞车发生在何处和如何发生,以及撞车的归因方法。这些洞察力使我们能够开发DeepAnalyze,这是一个新的多任务序列标签方法,用以识别堆迹中的指责框架。我们从四种流行的微软应用程序中评估了100多万次真实世界的碰撞模型,并显示DeepAlyze,从一组应用程序中经过碰撞训练的训练,不仅精确地将同一应用程序的局部崩溃变成其他的零层,而且还是崩溃。