With the popularity of smartphones, mobile applications (apps) have penetrated the daily life of people. Although apps provide rich functionalities, they also access a large amount of personal information simultaneously. As a result, privacy concerns are raised. To understand what personal information the apps collect, many solutions are presented to detect privacy leaks in apps. Recently, the traffic monitoring-based privacy leak detection method has shown promising performance and strong scalability. However, it still has some shortcomings. Firstly, it suffers from detecting the leakage of personal information with obfuscation. Secondly, it cannot discover the privacy leaks of undefined type. Aiming at solving the above problems, a new personal information detection method based on traffic monitoring is proposed in this paper. In this paper, statistical features of personal information are designed to depict the occurrence patterns of personal information in the traffic, including local patterns and global patterns. Then a detector is trained based on machine learning algorithms to discover potential personal information with similar patterns. Since the statistical features are independent of the value and type of personal information, the trained detector is capable of identifying various types of privacy leaks and obfuscated privacy leaks. As far as we know, this is the first work that detects personal information based on statistical features. Finally, the experimental results show that the proposed method could achieve better performance than the state-of-the-art.
翻译:随着智能手机的普及,移动应用程序(应用程序)已渗透到人们的日常生活中。虽然应用程序提供了丰富的功能,但它们也能同时获取大量个人信息。因此,提出了隐私问题。为了了解应用程序所收集的个人信息,提出了许多解决方案以发现应用程序中的隐私泄漏。最近,交通监测隐私泄漏检测方法表现良好,但还存在一些缺陷。首先,它因发现个人信息渗漏而变得模糊不清。其次,它无法发现未定义类型的隐私渗漏。为了解决上述问题,本文件提出了基于交通监测的新的个人信息探测方法。在本文件中,个人信息的统计特征旨在描述交通中个人信息的发生模式,包括当地模式和全球模式。然后,根据机器学习算法培训探测器,以发现类似模式的潜在个人信息。由于统计特征独立于个人信息的价值和类型,因此,经过培训的探测器能够识别各种类型的隐私渗漏和模糊个人信息。为了解决上述问题,本文件提出了以交通监测为基础的新的个人信息探测方法。在本文件中,个人信息的统计特征旨在描述交通中的个人信息的发生模式,包括当地模式和全球模式。然后,根据机器学习算法来发现类似模式的潜在个人信息。由于个人信息的价值和类型不同,我们所培训的统计特征可以更好地发现个人信息。最后,我们能够发现个人隐私泄漏,从而更好地了解个人信息。