Contemporary mobile applications (apps) are designed to track, use, and share users' data, often without their consent, which results in potential privacy and transparency issues. To investigate whether mobile apps have always been (non-)transparent regarding how they collect information about users, we perform a longitudinal analysis of the historical versions of 268 Android apps. These apps comprise 5,240 app releases or versions between 2008 and 2016. We detect inconsistencies between apps' behaviors and the stated use of data collection in privacy policies to reveal compliance issues. We utilize machine learning techniques for the classification of the privacy policy text to identify the purported practices that collect and/or share users' personal information, such as phone numbers and email addresses. We then uncover the data leaks of an app through static and dynamic analysis. Over time, our results show a steady increase in the number of apps' data collection practices that are undisclosed in the privacy policies. This behavior is particularly troubling since privacy policy is the primary tool for describing the app's privacy protection practices. We find that newer versions of the apps are likely to be more non-compliant than their preceding versions. The discrepancies between the purported and the actual data practices show that privacy policies are often incoherent with the apps' behaviors, thus defying the 'notice and choice' principle when users install apps.
翻译:现代移动应用程序(应用程序)旨在跟踪、使用和分享用户数据,往往未经用户同意,从而导致潜在的隐私和透明度问题。为了调查移动应用程序在如何收集用户信息方面是否始终(不)透明,我们对268和机器人应用程序的历史版本进行了纵向分析。这些应用程序包括2008年至2016年期间5,240个应用程序发布或版本。我们发现应用程序行为与隐私政策中公开使用的数据收集不一致,以披露合规问题。我们利用机器学习技术对隐私政策文本进行分类,以确定收集和(或)分享用户个人信息(如电话号码和电子邮件地址)的已知做法。为了调查移动应用程序在如何收集用户信息方面是否始终(不透明),我们随后通过静态和动态分析发现一个应用程序的数据泄漏。随着时间的推移,我们的结果显示应用程序数据收集做法的数量在2008年至2016年期间稳步增加,而隐私政策是描述应用程序隐私保护做法的主要工具。我们发现,较新的应用程序版本可能比其先前版本更加不合规,例如电话号码和电子邮件地址地址。我们发现,当用户的隐私政策与实际做法之间往往出现差异时,其行为与系统做法表明隐私做法时,因此,其行为与系统做法之间的不一致。