Building sound and precise static call graphs for real-world JavaScript applications poses an enormous challenge, due to many hard-to-analyze language features. Further, the relative importance of these features may vary depending on the call graph algorithm being used and the class of applications being analyzed. In this paper, we present a technique to automatically quantify the relative importance of different root causes of call graph unsoundness for a set of target applications. The technique works by identifying the dynamic function data flows relevant to each call edge missed by the static analysis, correctly handling cases with multiple root causes and inter-dependent calls. We apply our approach to perform a detailed study of the recall of a state-of-the-art call graph construction technique on a set of framework-based web applications. The study yielded a number of useful insights. We found that while dynamic property accesses were the most common root cause of missed edges across the benchmarks, other root causes varied in importance depending on the benchmark, potentially useful information for an analysis designer. Further, with our approach, we could quickly identify and fix a recall issue in the call graph builder we studied, and also quickly assess whether a recent analysis technique for Node.js-based applications would be helpful for browser-based code. All of our code and data is publicly available, and many components of our technique can be re-used to facilitate future studies.
翻译:用于真实世界 JavaScript 应用程序的正确和精确的静态呼叫图形,由于许多难以分析的语言特征,因此带来了巨大的挑战。此外,这些特征的相对重要性可能因使用的呼声图表算法和正在分析的应用类别而有所不同。在本文中,我们提出了一个技术,可以自动量化调用图不健全的不同根源对一组目标应用的相对重要性。技术工作是确定与静态分析所漏掉的调用边缘相关的动态功能数据流,正确处理有多种根源和相互依赖的调用。我们运用我们的方法,对一套基于框架的网络应用程序的调用最新调用图表构建技术进行详细研究。这项研究产生了一些有用的见解。我们发现,动态属性访问是造成调用图错漏边缘的最常见根源,而其他根源则因基准而不同,对分析设计者可能有用的信息。此外,我们的方法可以迅速确定和解决调用图表构建器中出现的问题。我们所研究的调用图的调用方法,也能够迅速评估我们最近可用的数据法系应用是否为基于公众的代码。