Inter-component communication (ICC) is a widely used mechanism in mobile apps, which enables message-based control flow transferring and data passing between components. Effective ICC resolution requires precisely identifying entry points, analyzing data values of ICC fields, modeling related framework APIs, etc. Due to various control-flow and data-value tracking related characteristics involved and the lack of oracles for real-world apps, the practical evaluation of ICC resolution techniques is challenging. To fill this gap, we collect multiple-type benchmark suites with 4,104 apps, covering the characteristic-specific, open-source, and commercial ones. Considering the differences of benchmarks, we adopt various evaluation metrics for them, which are based on the number count, graph structure, and reliable oracle. As the oracle for real-world apps is unavailable, we design a dynamic analysis approach to extract the real ICC links triggered during GUI exploration. Overall, we manually confirm 1,680 ICCs and label 42,000 code characteristic tags to form a reliable oracle set. The evaluation performed on six off-the-shelf ICC resolution tools show that tools behave inconsistently on multiple benchmarks and with different metrics. Using reliable oracles, we find that up to 39\% - 85\% ICCs are missed by the six tools, for their inadequate analysis of specific code characteristics. And with the help of graph metrics, we identify many wrongly reported ICCs caused by conservative analysis or the transitivity of imprecision. Finally, we summarize eight FN/FP patterns in ICC resolution for further improvement.
翻译:组件间通信(ICC)是移动应用程序中广泛使用的一种机制,它使基于信息的控制流转移和各个组件之间的数据传递成为了基于信息的移动应用程序。有效的国际商会决议要求精确地确定切入点,分析ICC字段的数据值,建模相关框架API等。由于所涉及的各种控制流和数据价值追踪相关特征,以及缺乏真实世界应用程序的触角,对国际商会解决技术的实际评估具有挑战性。为了填补这一空白,我们收集了多型基准套件,有4 104个应用程序,涵盖特定特性、开放源代码和商业应用程序。考虑到基准的差异,我们采用各种评估指标,这些基准基于数字计数、图表结构以及可靠或标准等模式。由于无法找到真实世界应用程序的标志,我们设计了一个动态分析方法,以提取在图形界面探索期间触发的国际商会真正链接。总体而言,我们人工确认了1 680 ICCs 和42,000个代码特征标签,以组成一个可靠或标准集。在6个离位的国际商会决议工具上进行的评估表明,在多个基准上的精确度上采取了不一致的做法,并且用不同的标准分析导致85号或错误的精确度分析。