Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are "unreachable" in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code.
翻译:本地代码现已成为Android 应用软件包中常见的本地代码。 本地代码通过 Java 本地界面与 Dex 字节代码共同存在并与 Dex 互动, 以提供丰富的应用程序功能。 然而, 最先进的静态分析方法大多忽略了本地代码的存在, 然而, 本地代码可能执行某些关键敏感甚至恶意的部分应用程序行为。 艺术状态的这种限制在一系列的静态分析中严重威胁到有效性, 这些分析没有完整地看到应用程序中的可执行代码。 为了解决这个问题, 我们提议在开发一个统一代码的统一型号的雄心勃勃勃的研究方向上取得新的进展。 本文中介绍的Jucify 方法是朝着这种模式迈出的重要一步, 我们提取和合并了本地代码的调用图图图图图图图图, 使得最终模型在通用的Android分析框架中易于使用: Jucifil 以 Soot 内部演示文则为基础。 我们通过统一的模型进行了实证调查, 我们从本地代码中呼吁的大量 Java 方法是如何“ 无法突破” 。 在内部智能代码中, 我们通过智能解算数据,, 能够在内部解码中, 我们的解算系统解码中,, 能够在内部解码中, 我们的解算系统解码,, 能够在内部解码中, 解码中, 我们的解码, 能够通过内部解码, 我们的解码,, 解码中,,,, 能够在内部解码可以实现。