The ability to identify the author responsible for a given software object is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like iOS, attribution in the Android ecosystem is known to be hard. Prior research has leveraged market metadata and signing certificates to identify software authors without questioning the validity and accuracy of these attribution signals. However, Android app authors can, either intentionally or by mistake, hide their true identity due to: (1) the lack of policy enforcement by markets to ensure the accuracy and correctness of the information disclosed by developers in their market profiles during the app release process, and (2) the use of self-signed certificates for signing apps instead of certificates issued by trusted CAs. In this paper, we perform the first empirical analysis of the availability, volatility and overall aptness of publicly available metadata for author attribution in Android app markets. To that end, we analyze a dataset of over 2.5 million market entries and apps extracted from five Android markets for over two years. Our results show that widely used attribution signals are often missing from market profiles and that they change over time. We also invalidate the general belief about the validity of signing certificates for author attribution. For instance, we find that apps from different authors share signing certificates due to the proliferation of app building frameworks and software factories. Finally, we introduce the concept of attribution graph and we apply it to evaluate the validity of existing attribution signals on the Google Play Store. Our results confirm that the lack of control over publicly available signals can confuse the attribution process.
翻译:确定对某一软件对象负有责任的作者的能力对于许多研究以及提高软件透明度和问责制至关重要。然而,与iOS等其他应用市场相比,Android生态系统的归属已知十分困难。先前的研究利用了市场元数据和签名证书来识别软件作者,而没有质疑这些归属信号的有效性和准确性。然而,Android App的作者可以有意或错误地隐藏其真实身份,原因是:(1) 市场缺乏政策强制执行,无法确保开发商在应用程序发布过程中在其市场概况中披露的信息的准确性和正确性,(2) 使用自签证书签署应用程序,而不是受信任的CA签发证书。在本文中,我们首次从经验上分析了公开提供的用于作者属性信号的元数据的可用性、波动性和总体适切性,而没有质疑这些属性信号。为此,我们分析了超过250万个市场条目和5个ANDroid市场的应用程序的数据集,超过两年。我们的研究结果表明,广泛使用的归属信号往往从市场概况中丢失,而且它们会随着时间的推移而改变。我们还否定了对签署最终扩散框架证书的有效性的一般信念,我们引入了最终的作者的认证。