In this work, we study the privacy risk due to profile matching across online social networks (OSNs), in which anonymous profiles of OSN users are matched to their real identities using auxiliary information about them. We consider different attributes that are publicly shared by users. Such attributes include both strong identifiers such as user name and weak identifiers such as interest or sentiment variation between different posts of a user in different platforms. We study the effect of using different combinations of these attributes to profile matching in order to show the privacy threat in an extensive way. The proposed framework mainly relies on machine learning techniques and optimization algorithms. We evaluate the proposed framework on three datasets (Twitter - Foursquare, Google+ - Twitter, and Flickr) and show how profiles of the users in different OSNs can be matched with high probability by using the publicly shared attributes and/or the underlying graphical structure of the OSNs. We also show that the proposed framework notably provides higher precision values compared to state-of-the-art that relies on machine learning techniques. We believe that this work will be a valuable step to build a tool for the OSN users to understand their privacy risks due to their public sharings.
翻译:在这项工作中,我们研究了由于在网上社交网络(OSNs)中配置匹配配置而导致的隐私风险,在网上社交网络中,通过辅助信息将OSN用户的匿名配置与其真实身份相匹配。我们考虑了用户公开共享的不同属性。这些属性包括用户名称等强有力的识别特征,以及不同平台用户不同职位之间的兴趣或情绪差异等薄弱识别特征。我们研究了使用这些属性的不同组合进行配置匹配以广泛显示隐私威胁的影响。拟议框架主要依赖机器学习技术和优化算法。我们评估了三个数据集的拟议框架(Twitter - Foursquare、Google+-Twitter和Flickr),并展示了如何通过使用公开共享的属性和(或)OSNs的基本图形结构,将不同用户的特征与高概率匹配。我们还表明,拟议框架提供了比依赖机器学习技术的状态技术更精确的显著值。我们认为,这项工作将是一个宝贵的步骤,为OSN用户建立一个工具,以了解其隐私风险,因为公众共享。